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The  lack  of  traceable  nerve  pathways  in 
the  visual  cortices  of  mammals  has,  in  the 
last  decade,  generated  speculation  about 
machines  which  might  learn  to  recognize 
visual  patterns  but  which  would  be  ran¬ 
domly  wired  to  their  source  of  informa¬ 
tion.  This  report  analyzes  such  devices 
and  shows  that  at  least  certain  mild 
constraints  of  a  distance-ordering  sort 
must  be  imposed  upon  random  connec¬ 
tions.  Mammalian  visual  system  neuro¬ 
anatomy  is  then  re-examined  in  this  light. 
This  work  was  carried  out  under  Project 
6632  and  Task  66325,  and  it  was  sub¬ 
mitted  for  publication  in  February  1961. 


Two  neurons  from  the  visual  cortex  of  a  cat.  The 
staining  technique  used  affects  only  a  portion  of 
all  nerve  cells  but  stains  these  rather  thoroughly; 
in  an  actual  brain  the  neurons  are  packed  more 
closely  together  than  this  figure  would  suggest. 
The  larger  branching  processes  are  dendrites  con¬ 
cerned  with  receiving  signals  from  other  cells. 
An  output  axon  is  visible  on  the  left  as  the  small 
fiber  descending  from  the  base  of  the  main  cell 
body.  (Illuatration  reproduced  from  The  Organi¬ 
sation  of  the  Cerebral  Cortex,  by  D.  A.  Sholl,  through 
courtesy  of  Methuen  A  Co.  Ltd.,  Copyright  19M.) 
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LEFT 

X-ray  photograph,  lateral  view,  of  head  of  normal 
eight-year-old  child  with  the  visual  system  drawn 
in  its  normal  position.  The  optic  nerve  leaves  the 
eyeball  at  right  and  divides  to  send  fibers  to  the 
opposite  geniculate  and  to  receive  fibers  from  the 
opposite  eye  at  the  optic  ehiasma,  about  an  inch 
behind  the  eyeball.  Each  optic  tract  (only  one  of 
which  is  indicated)  to  the  left  of  the  ehiasma  thus 
contains  fibers  drawn  more  or  less  equally  from 
both  eyes,  and  ends  in  the  lateral  geniculate  body 
in  the  center  of  the  photograph.  From  here  new 
fibers,  the  optic  radiations,  fan  out  to  the  visual 
cortex  (striate  cortex)  at  the  rear  of  the  brain, 
where  their  endings  are  indicated  by  heavy  dots. 
(Illustration  reproduced  from  The  Vtrisbrai* 
Visual  System,  by  S.  Polyak,  through  courtesy  of 
the  University  of  Chicago  Press.  Copyright  1957 
by  the  University  of  Chicago.) 


RIGHT 

Continuous  mapping  of  the  visual  field  through 
successive  stations  in  the  visual  pathways  culmi¬ 
nating  at  the  striate  cortex.  For  each  half  of  the 
visual  field  adjacent  regions  of  the  field  map  even¬ 
tually  on  to  adjacent  regions  of  the  cortex.  (Illus¬ 
tration  reproduced  from  Functional  Neuroanatomy, 
by  W.  J.  S.  Kriag,  McGraw-Hill  Book  Company, 
1968,  through  courtesy  of  the  author.) 
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In  terms  of  specificity  of  organization  we 
may  conceive  of  pattern  recognizing  ma¬ 
chines  as  arranged  on  a  scale  from  the 
completely  randomly  connected  at  one 
end  to  precise  configurations  capable  of 
responding  only  to  pre-defined  characters 
at  the  other.  The  former,  of  which  several 
have  been  proposed  in  the  literature, 
would  be  expected  to  acquire  a  “set” 
toward  useful  characteristics  of  patterns 
presented  to  them  during  a  learning  phase 
due  to  some  built-in  plastic  property  of 
their  makeup,  while  the  latter  are  merely 
templates  responding  only  to  exact  copies 
of  patterns  initially  designed  into  them. 
Some  machines  of  this  latter  kind  are  now 
in  actual  productive  operation  sensing 
magnetic  ink  characters  on  bank  checks 
and  so  on,  but  since  their  design  features 
are  so  completely  known  they  tend  to  lack 
speculative  interest.  The  random  ma¬ 
chines,  on  the  other  hand,  seem  to  have 
been  arrived  at  by  analogy  to  living  net¬ 
works  of  interconnected  nerve  cells  whose 
apparent  lack  of  traceable  circuits  —  es¬ 
pecially  in  the  visual  cortex  of  mammals  — 
is  one  of  the  puzzles  of  neuroanatomy. 

Many  of  the  recently  published  studies 
on  automatic  pattern  learning  refer  to  de¬ 
vices  well  in  the  middle  of  this  scale  or 
toward  the  “highly  organized”  end  of  it, 
and  workers  in  the  art  increasingly  seem 
to  feel  that  a  strong  initial  organization  of 
an  hierarchical  kind  is  a  necessary  feature 
of  successful  automata  which  can  be 
taught  to  recognize  patterns. 

The  puzzle  of  apparently  random  corti¬ 
cal  connections  found  in  mammalian  vis¬ 
ual  systems  remains,  however,  to  suggest 
fruitful  machine  analogies  toward  the 
unorganized  end  of  the  scale.  It  is  the 
purpose  of  this  report  to  examine  rather 
rigorously  some  consequences  of  random 
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organization  which  in  turn  is  shown  not 
to  be  practical  in  pure  form.  Very  mild 
constraints  are  then  placed  upon  other¬ 
wise  random  connection  paths  in  an  at¬ 
tempt  to  define  a  most  general  fthat  is, 
non-specific)  class  of  workable  machines. 
These  constraints  are  stated  in  terms  of 
distance  in  Euclidean  space  and,  building 
upon  the  distance  ordering  of  that  space, 
lead  logically  to  rather  highly  specific  re¬ 
strictions  upon  permissible  connection 
modes.  A  brief  survey  of  recent  neuro¬ 
physiological  literature  on  the  visual  sys¬ 
tems  of  cats,  primates  and  man  indicates 
that  a  corresponding  specificity  is  en¬ 
countered  there  as  well,  and  that  it  is  more 
geometrical  and  morphological  than  point- 
to-point. 

In  terms  of  the  framework  developed 
in  this  report  it  seems  safe  to  conclude 
that  machines  in  the  immediate  neighbor¬ 
hood  of  the  random  end  of  the  scale  can 
legitimately  be  excluded  from  future  con¬ 
sideration  and  that  plausible  phylogenetic 
reasons  exist  for  the  observed  continuous 
mapping  of  the  visual  field  on  to  the 
occipital  pole  in  mammals. 

This  report  is  intended  for  readers  in 
neuroanatomy  as  well  as  computer  tech¬ 
nology.  A  glossary  of  terms  is  accordingly 
supplied,  and  the  author  respectfully  re¬ 
quests  indulgence  in  advance  for  obvious 
oversimplifications  committed  in  the  name 
of  brevity.  A  less  technical  version  for  the 
general  reader  will  be  available  as  a  sepa¬ 
rate  document  under  the  title  “Elemen¬ 
tary  Pattern  Perception  in  Machine  and 
Mammal.” 


3 


II 


Pram**  and  Patterns 


We  will  work  with  binary-quantized  infor¬ 
mation  for  two  important  reasons:  first, 
most  equipment  so  far  proposed  for  auto¬ 
matic  pattern  recognition  makes  use  of 
digital  processes  on  quantized  data,  and 
several  digital  computer  programs  have 
already  been  employed  in  related  research 
(Farley  and  Clark;  Selfridge;  Rosenblatt; 
Bledsoe  and  Browning;  Roberts;  Doyle); 
and  second,  binary  quantization  makes  it 
easy  and  profitable  to  use  concepts  from 
elementary  combinatorial  mathematics  and 
the  theory  of  finite  point-sets. 

By  "pattern'’  we  shall  mean  something 
like  a  meaningful  picture.  A  pattern  may 
be  a  radar  PPI  display  of  an  air  traffic 
situation,  a  time-slice  of  speech  sounds 
recognizable  by  a  human  listener,  a  single 
typed  or  printed  alphabetical  character,  a 
common  geometrical  shape  on  a  two- 
dimensional  spatial  field,  a  sequence  of 
Morse  code  signals  to  which  meaning  is 
assigned  by  convention,  and  so  on.  It 
will  be  understood  that  such  patterns  are 
quantized  into  n  bits  unless  otherwise 
stated  (Shannon).  In  particular  it  is  con¬ 
venient  to  imagine  visual  patterns  binary- 
quantized  in  intensity  and  displayed  upon 
a  Vn  by  Vn-bit  surface.  Any  arbitrary 
arrangement  of  n  bits  will  be  called  a 
"frame,”  a  term  adapted  from  radar  and 
television  usage.  Thus  all  patterns  are 
frames,  but  most  frames  are  not  patterns, 
the  distinction  being  that  patterns  are 
frames  which  are  humanly  meaningful  in 
some  appropriate  sense. 

Where  one  pattern  leaves  off  and  another 
begins  is  usually  clear  in  a  given  particular 
context  but  is  somewhat  difficult  to  tie 
down  in  the  abstract.  In  the  case  of  visual 
frames  we  may  imagine  temporal  sequences 
of  successive  ra-bit  two-dimensional  spatial 
patterns.  In  quantized  speech,  each 
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pattern  might  be  defined  as  some  rather 
arbitrary  time  interval  and  its  associated 
frequencies  and  amplitudes,  so  that  se¬ 
quences  of  patterns  are  successions  of  such 
intervals.  (Indeed,  one  of  the  difficulties 
in  speech  work  seems  to  be  the  lack  of  con¬ 
sistent  partitions  between  adjacent  pat¬ 
terns,  dependent  as  these  are  upon  the 
varying  modes  of  articulation  characteristic 
of  the  humanspeech-generatingapparatus.) 
Whatever  pattern-defining  dimensional 
scheme  is  used  in  some  particular  in¬ 
stance,  however,  will  also  serve  to  define 
the  corresponding  frames,  and  frames  and 
patterns  alike  will  be  assumed  quantized 
into  n  bits  as  previously  stated. 

Problems  in  automatic  pattern  recog¬ 
nition  usually  have  as  their  goal  the  devel¬ 
opment  of  a  computer  program  or  of  a 
machine  which  will  give  a  predetermined 
response  when  excited  by  any  member  of 
some  class  of  equisignificant  patterns.  By 
“equisignificant”  we  mean  that  two  or 
more  patterns  have  the  same  meaning  to 
the  machine’s  designers  and  operators 
(Reichenbach;  Selfridge,  1955).  The  cru¬ 
cial  point  here  is  that  equisignificance, 
being  a  semantic  property,  is  a  matter  of 
human  decision  or  of  historical  convention 
arising  from  the  evolution  of  shared  experi¬ 
ences,  and  automatic  pattern  recognition 
—  even  in  principle  —  is  possible  at  all 
only  because  the  notion  of  equisignificance 
is  reasonably  well-correlated  with  certain 
informational  properties  of  the  patterns 
themselves.  These  properties,  in  terms  of 
which  human  beings  classify  patterns  ac¬ 
cording  to  equisignificance,  in  turn  depend 
heavily  upon  such  fundamental  ordering 
concepts  of  topological  and  metric  spaces 
as  contiguity,  continuity,  connectivity, 
and  so  on,  at  least  in  most  practically  use¬ 
ful  instances.  Examples  of  such  spaces  are 


Euclidean  metric  space  of  two  dimensions, 
appropriate  to  visual  patterns,  and  the 
pair  of  continuous  one-dimensional  spaces 
in  frequency  and  intensity  often  used  in 
speech  analysis. 

Many  machine  proposals  for  pattern 
recognition  involve  some  self-adaptive  or 
"learning”  function,  by  means  of  which  a 
machine  can  be  "taught”  to  respond  to  a 
given  equisignificance  class  by  presenting 
to  it  a  sequence  of  patterns  drawn  from 
that  class,  with  or  without  human  monitor¬ 
ing.  In  the  case  of  quantized  patterns  it  is 
obvious  that  temporally  adjacent  mem¬ 
bers  of  such  a  sequence  may  have  several 
bits  in  common;  this  phenomenon  will  be 
referred  to  as  “overlap.”  Mathematical 
analysis  will  be  facilitated,  however,  if  we 
confine  our  attention  to  sequences  of 
patterns  in  which  the  overlaps  are  more 
or  less  random,  so  that  order  in  time  is  not 
important.  Disregarding  temporal  order, 
then,  we  may  speak  simply  of  classes  of 
patterns  and  of  frames,  and  any  given 
sequence  defines  such  a  class. 

It  is  clear  that  there  are  2n  distinct 
frames  possible,  corresponding  to  the  2" 
ways  of  assigning  values  either  0  or  1  to 
n  bits.  Further,  we  may  define  frame 
classes  by  considering  sequences  of  0,  1,  2, 
3,  . . .  frames;  a  class  of  k  frames  can  be 
constructed  from  the  2"  possible  in  (Ji) 
different  ways,  the  number  of  ways  in 
which  2"  things  can  be  chosen  it  at  a  time. 
Thus  where  C  is  the  class  of  frame  classes 
and  N(C)  is  the  number  of  members 
it  contains, 

*>n 

N(C)  =  t  C") 

f=0 

=  22’1. 

That  N(C)  is  an  upper  bound  is  easy  to 


prove  by  indirection.  Suppose  we  have  an 
N(C)  +  1’*  class  of  frames  allegedly  not 
to  be  found  in  C.  This  class  will  have  some 
definite  number,  say  l  <  2\  of  frames  as 
its  members.  But  in  the  above  enumera¬ 
tion  defining  N(C)  we  have  already  in¬ 
cluded  all  possible  Z-frame  classes  in  the 
series  of  (V)  terms;  hence  our  supposedly 
new  class  has  already  been  counted  in  the 
enumeration. 


Ill  Matrios  and  Class**  on 
n-BIt  Spaoas 
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Consider  a  two-dimensional  y/n  by  y/n- 
bit  frame  whose  binary  elements  are  arbi¬ 
trarily  labeled  a,  b,  c,  . . . ,  in  some  order. 
If  these  labels  are  assigned  to  the  frame 
elements  randomly,  as  by  consulting  a 
table  of  random  numbers  or  by  drawing 
the  labels  from  a  hat  as  the  frame  ele¬ 
ments  are  systematically  scanned,  then 
no  order  of  label  assignation  can  be  called 
“preferred”  over  any  other  order.  In 
particular,  any  two  labels  may  be  trans¬ 
posed  without  affecting  the  randomness 
of  the  labeling,  and  any  arrangement  of 
labels  can  be  generated  from  any  other  by 
a  finite  sequence  of  pairwise  interchanges. 
What  meaning  can  be  given  to  the  notion 
of  “distance”  between  pairs  of  points 
randomly  labeled  in  this  fashion? 

It  is  commonly  agreed  (Busemann) 
that  any  distance-defining  function,  or 
metric,  on  a  space  of  n  points  must  meet 
at  least  the  following  four  conditions: 
where  d(x,  y)  is  such  a  metric  and  x,  y,  z, 
are  any  three  of  the  n  points,  d(x,  y)  de¬ 
fines  a  real  number  satisfying 

I.  d{x,  y)  >  0,  x  *  y 

II.  d( x,  x)  =  0 

III.  d( x,  y)  =  d{y,  x) 

IV.  d{x,  y)  -f  d(y,  z)  >  d{x,  z). 

But  we  have  just  seen  that  for  the  random 
n-bit  frame  we  may  arbitrarily  transpose 
the  labels  of  any  two  elements;  hence  any 
acceptable  metric  must  also  be  invariant 
under  transposition.  The  only  metric 
meeting  this  condition,  as  well  as  I  through 
IV  above,  is 


d(x,  y)  =  constant,  for  all  x  ^  y. 
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We  prove  this  by  assuming  its  contra¬ 
diction:  suppose  d'(x,  y)  is  some  proposed 
metric  such  that,  for  some  x,  y,  z, 

d'(x,  y)  ^  d'(x,  z). 

Then  either 

d'(x,  y)  >  d'(x,  z)  or  d'{x,  y)  <  d'(x,  z) 

but  not  both.  Regardless  of  which  of 
these  two  inequalities  is  assumed  true  it 
can  be  falsified  by  transposing  y  and  z. 
Hence 

d'(x,  y)  =  d'(x,  z), 

equivalent  to  stating  that  the  distance 
between  any  two  non-identical  points  is 
constant,  and  their  distances  from  one 
another  cannot  be  ordered  save  in  a  trivi¬ 
al  sense. 

On  the  random  n-bit  frame,  then,  we 
have  no  way  of  adequately  taking  into 
account  order-  and  metrically-derived 
properties  of  patterns  other  than  identity 
or  non-identity  of  bits.  In  particular  we 
have  no  way  of  describing  the  following 
humanly  important  properties  of  points  o, 
6,  c,  .  .  .  ,  comprising  patterns: 

1.  Point  a  is  farther  from  b  than  from  c. 

2.  o,  b,c,  d,  . . .  are  the  only  points  equi¬ 
distant  from  p. 

3.  a  is  adjacent  to  b  (that  is,  there  is  no 
point  c  such  that  d( a,  c)  <  d  (a,  &)). 

4.  Points  (a,)  are  such  that  a,  is  adjacent 
to  o,+ 1,  so  that  a*  to  a(,  k  <  i  <  l,  form 
a  continuous  "line". 


5.  Points  (a,),  k  <  i  <  l,  describe  a 
simply-connected  closed  figure.  That 
is,  only  a,_i  and  a,+i  are  adjacent  to 
o„  a*  =  a,,  and  no  other  o,  =  a,-, 
i  *j,  k  <  i,  j  <  l. 


The  reasoning  behind  the  preceding 
argument  can  be  put  another  way.  Let 
us  imagine  a  storeroom  filled  with  all 
possible  n-bit  random  frames,  from  which 
one  is  blindly  chosen.  After  some  exami¬ 
nation,  suppose  we  assert  that  we  can 
define  a  metric  d’(x,  y)  in  terms  of  which 
d’(x,  y)  >  d’{ x,  z)  for  some  triples  of 
points  on  that  frame.  Then  there  exists 
in  the  storeroom  at  least  one  other  frame 
for  which  d'(x,  y)  <  d'(x,  z )  for  some  of 
the  x,  y,  z’s  in  question,  namely  that 
frame  generable  from  the  original  by  ap¬ 
propriate  interchanges  of  y's  and  z’s. 
Since  that  second  frame  might  just  as  well 
have  been  chosen  initially,  we  must  con¬ 
clude  that  the  only  metric  holding  for  all 
frames  is  one  which  defines  some  constant 
distance  between  all  non-identical  x’s  and 
y’s.  Thus  again  we  have  no  way  of  describ¬ 
ing  adjacency,  connectivity,  and  other  re¬ 
lated  properties  of  patterns,  and  we  must 
conclude  that  only  non-ordered  (or  trivial, 
such  as  overlapping)  properties  can  mean¬ 
ingfully  be  treated  in  the  case  of  ran¬ 
dom  frames. 

The  term  "random  automaton”  is  in¬ 
troduced  here  in  a  loosely-defined  way  to 
refer  to  any  pattern  recognizing  learning 
machine  whose  input  is  derived  from  a 
random  frame,  that  is,  whose  initial 
structure  or  organization  is  essentially 
unaffected  by  permuting  elements  of  the 
input  frame.  Such  machines  have  been 
discussed  by  Sholl  and  Uttley;  Uttley; 
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Rosenblatt;  Day  and  Newman;  Minsky 
and  Selfridge.  It  is  especially  interest¬ 
ing  to  consider  the  behavior  of  these 
devices  after  excitation  by  a  sequence  of 
patterns  impinging  upon  the  input  ele¬ 
ments  with  no  concomitant  reinforcing 
and  inhibiting  signals  from  the  experi¬ 
menter. 

We  have  seen  that  a  useful  metric  can¬ 
not  be  defined  for  such  a  system  because, 
being  random,  the  initial  connections  be¬ 
tween  input  elements  and  the  rest  of  the 
device  can  be  arbitrarily  permuted.  Con¬ 
sider  now  two  such  automata  differing 
only  in  some  permutation  P  of  the  input 
elements  of  the  second  with  respect  to 
those  of  the  first.  Both  are  random 
automata,  and  both  will  respond  identi¬ 
cally  (we  here  presuppose  no  internal 
noise  source)  if  the  first  is  exposed  to 
some  n-bit  pattern  and  the  second  to  that 
same  pattern  permuted  by  P.  Yet  P  may 
be  any  permutation,  and  may  transform 
the  pattern  into  what  appears  to  be  a 
randomly-spaced  collection  of  l’s  and  0’s 
on  the  w-bit  frame.  Conversely,  there 
exists  a  permutation  P'  which  transforms 
an  arbitrary  arrangement  of  bits  into  some 
prespecified  pattern  having  the  same 
proportion  of  l’s  and  0's.  We  cannot 
reasonably  expect,  then,  that  a  random 
automaton  will  be  biased  toward  respond¬ 
ing  to  (humanly-important)  patterns  to  a 
greater  degree  than  toward  random  arrays 
of  bits,  provided  the  temporal  distribution 
of  overlaps  is  also  approximately  random 
in  both  types  of  sequences.  In  particular, 
there  is  no  reason  a  priori  to  expect  re¬ 
sponse  discrimination  in  favor  of  some 
one  property  with  respect  to  which  all 
patterns  in  a  sequence  are  equisignificant. 

Yet  if  the  random  automaton  has  cer¬ 
tain  characteristics  of  instability,  so  that  it 


tends  to  favor  responses  toward  some 
characteristic  of  a  sequence  of  patterns 
presented  to  it  as  the  sequence  proceeds, 
how  can  we  evaluate  the  chances  of  a  given 
property  being  singled  out  for  such  re¬ 
sponse  reinforcement?  Better  stated,  after 
t  patterns  have  been  "shown”  to  the  de¬ 
vice,  how  can  we  predict  which  of  the 
remaining  2n  —  t  frames  will  elicit  a  posi¬ 
tive  response  from  it?  To  do  this  we 
clearly  must  form  an  idea  of  the  maximum 
number  of  possible  ways  of  categorizing 
frames  and  some  estimation  of  their  rela¬ 
tive  probabilities.  It  has  already  been 
shown  that  classes  of  patterns  defined 
according  to  metric  or  ordering  properties 
can  expect  no  preferential  treatment  in  a 
random  automaton,  and  in  Section  II  we 
derived  the  result  that  there  are 

N(C)  =  2r 

distinct  possible  abstract  classes  of  n-bit 
frames.  Any  sequence  of  2"  or  less  distinct 
patterns  must  fall  into  one  or  another  of 
these  classes  (disregarding  time  order  as 
before,  on  the  assumption  of  random  time- 
distributions  of  overlaps),  and  in  fact  a 
sequence  of  t  <  2"  distinct  patterns  will 
find  membership  in 

2**-' 

classes  simultaneously.  This  is  proved  by 
the  fact  that  once  t  of  the  possible  2” 
frames  have  been  employed  there  remain 
2"  -  t  of  them  each  of  which  may  be  in¬ 
cluded  or  not  in  a  total  of 

2*"-' 

classes,  and  a  like  number  of  new  classes 
may  be  generated  by  the  class  sum  of  these 
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with  that  containing  the  original  t  patterns. 
Under  these  conditions  it  is  useful  to 
attempt  to  estimate  the  probability  that  a 
random  automaton  will  select  the  class  of 
equisignificant  patterns  which  the  opera¬ 
tor  has  in  mind,  from  the  set  of  possible 
classes  also  containing  the  same  patterns. 

Let  E  be  a  class  of  patterns  equisignifi¬ 
cant  to  the  experimenter  because  all  share 
some  predefined  property  recognizable  by 
him.  We  specify  further  that  E  is  well- 
defined  in  the  sense  that  the  experimenter 
can  always  decide  unambiguously  whether 
or  not  any  given  frame  belongs  in  E,  and 
that  E  contains  all  patterns  which  share 
the  given  criterion  of  similarity.  Obvi¬ 
ously,  for  E  to  be  useful, 

N(E)  <  2". 

In  addition  it  may  easily  turn  out  that 
N(E)  >  l, 

in  which  case  there  are 

(T) 

distinct  pattern  classes  (distinct  /-term 
sequences  disregarding  order)  formable 
from  the  members  of  E.  The  random 
automaton  is  to  build  up  response  behav¬ 
ior  to  some  class  of  patterns,  preferably 
to  E,  so  that  it  hopefully  will  give  an 
unambiguous  positive  response  only  when 
shown  a  t  +  1“  member  of  E.  What  are 
the  chances  of  its  accomplishing  this?  We 
have  noted  that 

22”-‘ 


that  only  one  of  these  classes  is  E,  but 
lack  of  mutual  exclusion  complicates 
matters.  If  we  consider  those  classes 
having  t  -f-  1,  t  +  2,  . . .,  t  +  k  frames  as 
members  and  also  including  the  initial  t 
patterns,  there  are 

2"  -  t,  (2»  -  0(2"  —  <  — 


S  2*"  for  2"  >  >  *  +  k 


classes  respectively  containing  these  pat¬ 
terns,  the  ones  to  the  right  including  all 
those  to  their  left.  Let  us  consider  the 
first  of  these  categories  only,  the  2"  -  t 
classes  each  of  t  +  1  members,  of  which 
t  are  found  in  E,  and  let  us  assume  for  the 
moment  that  E  has  precisely  t  +  1  mem¬ 
bers.  Since  our  machine  is  random  in  the 
sense  already  discussed  it  seems  reason¬ 
able  to  assume  that  each  of  these  2n  —  t 
classes  is  close  to  being  equiprobable,  that 
is,  that  the  chances  are  roughly  the  same 
that  it  will  have  organized  itself  to  any 
one  of  the  2”  —  t  possible  classes  contain¬ 
ing  t  members  plus  one  more  frame. 
Hence  it  appears  that  its  chances  of  re¬ 
sponding  favorably  to  the  t  +  1“  member 
of  E  is  about 

(2-  -  <)-'• 

If  there  were  two  patterns  remaining  in  E 
after  the  first  t  of  them  had  been  presented 
to  the  automaton,  the  corresponding 
chance  of  proper  response  to  the  remaining 
members  of  E  would  be  about 


members  of  C  contain  all  t  patterns  and  ((2"  —  t)(2n  -  t  -  1))_1, 
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and  so  on,  the  probability  decreasing  by 
the  factor  (2"  -  t  -  k)~'  as  we  go  from 
the  kth  to  the  k  +  l"*  pattern  in  E  after 
the  first  t  of  them.  In  any  event  it  is  clear 
that  these  probabilities  are  quite  small 
unless  t  is  rather  close  to  2".  But  for  the 
machine  to  be  useful  in  any  real  sense 
t  should  be  very  substantially  less  than  2" 
or  else  there  will  not  be  much  capacity  for 
new  “experience"  left  in  the  device.  For 
a  rather  minimum  frame  structure  of  25 
by  25  bits  2"  =  2‘25,  and  it  is  clear  that  no 
actual  sequence  of  stimuli  can  approach 
such  a  value  in  number  of  terms,  thus  effec¬ 
tively  meeting  this  last  condition.  But 
by  the  same  phenomenon  the  chance  of 
the  machine  correctly  responding  only  to 
a  t  +  l'1  pattern  in  E  is  almost  vanish¬ 
ingly  small. 

The  preceding  discussion  assumed  that 
all  of  the  equisignificant  patterns  com¬ 
prising  E  were  known,  in  the  sense  that  an 
exhaustive  enumeration  was  possible.  We 
now  turn  to  classification  of  patterns  in 
terms  of  some  similarity  property  or  prop¬ 
erties  shared  by  them,  rather  than  in  terms 
of  a  complete  enumeration  of  the  members 
of  each  class.  As  before,  however,  our 
objective  is  to  estimate  the  number  of 
such  classes  which  can  exist  and  hence  to 
form  a  rough  guess  as  to  the  chances  of  a 
random  pattern-learning  automaton's 
adapting  itself  to  respond  only  to  patterns 
of  one  given  class.  It  is  again  assumed  that 
no  human  monitoring  of  the  device  occurs 
as  the  trials  proceed.  Suppose  some  prop¬ 
erty,  p,  is  “squareness,”  and  we  stimulate 
a  random  automaton  with  a  sequence 
S  =  Si,  S2,  S3,  . . .,  S,  of  frames,  each  one 
of  which  is  a  square  pattern  or  not,  making 
no  attempt  to  enumerate  the  total  possible 
number  of  square  patterns.  Clearly  the 
members  of  S  can  be  grouped  into  two 


classes  (assuming  random  overlapping  as 
before)  according  to  whether  or  not  their 
members  are  square  patterns.  In  conven¬ 
tional  set-theoretic  notation  these  two 
classes  are 

{x  |  px], 

the  class  of  all  x’s  which  have  the  prop¬ 
erty  p  (it  being  assumed  that  x  is  in  S), 
and 

[x  |  p'x j, 

the  class  of  all  x’s  in  S  which  do  not  have 
the  property  p.  We  now  introduce  another 
property,  q,  not  related  to  p,  such  as  thick 
or  thin  lined  figures  —  that  is  { x  |  qx]  is 
the  class  of  all  thick-lined  figures  in  S  and 
{x  |  q'x\  is  the  class  of  all  thin-lined  figures, 
every  pattern  in  S  being  either  a  thick- 
lin  ?d  or  a  thin-lined  figure.  In  terms  of 
both  p  and  q  it  is  clear  that  we  can  define 
four  disjoint  subclasses  of  S  as  follows: 

Co  =  {x\  p'x-q'x ), 

the  class  of  all  x's  in  S  which  do  not  have 
the  property  p  and  do  not  have  the  prop¬ 
erty  q  —  that  is,  which  are  neither  square 
or  thick-lined  figures; 

Ci  =  {x  |  p'x-qx], 

c2  =  [x  |  px  q’x J,  and 

c,  =  (x  |  px-qx). 

Now  these  four  disjoint  subclasses  can  be 
logically  summed  in  sixteen  distinct  but 
not  necessarily  mutually  exclusive  classes 
which  categorize  the  members  of  S  in 
terms  of  the  two  properties  p  and  q  (in 
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the  notation  to  follow  “+”  is  read  “or,” 
just  as  stands  for  “and”): 

Co  =  null  class 

Ci  =  c0  =  (x  |  p'x  q'z} 

Ci  =  Ci  =  [x  |  p’xqx | 

C3  =  Ci  =  [x  |  pxg'x} 

C4  =  c3  =  (x  |  px-qx) 

C,  =  Co  U  Ci  =  (x  |  p'xj 

Ci  =  Co  U  c2  =  {x  |  g'x} 

C7  '=  Co  U  c3  =  jx  |  p’x  q’z  +  pxgx) 

C8  =  c,  U  c2  =  (x  |  p’x  qx  +  px  q’x] 

C,  =  Ci  U  c3  =  (x  |  gx} 

C,0  =  c2  U  c3  =  |x  |  px } 

Cn  =  Co  U  Cl  yj  Cl  =  \x  I  p’x  +  g'x} 

Cu  =  Co  U  Ci  U  ci  =  {x  |  p’x  +  gx} 

Ci,  =  Co  W  ci  yj  c,  =  }x  |  px  +  g'x} 

Cu  =  Cl  U  Cl  yj  c,  =  |x  I  px  +  qx } 

Cn  =  Co  Ci  Ci  C3  ==  $• 

If  our  automaton  has  the  capability  of 
organizing  itself  to  respond  to  patterns 
of  just  one  class  after  l  trials,  as  would  be 
the  case  if  it  has  but  one  binary  output 
indicator,  then  even  if  it  were  somehow 
restricted  to  classifications  only  in  terms 
of  p  and  q  it  is  far  from  obvious  which  of 
the  above  sixteen  classes  would  be  selected. 


On  a  random  basis  and  assuming  that  c„ 
through  c3  have  about  the  same  number 
of  members  we  might  estimate  that  Cu 
would  be  the  most  favored  class,  followed 
by  Cn  through  Cu,  and  so  on. 

Introduction  of  a  third  property,  r, 
would  have  expanded  the  number  of  the 
C,  to  256,  and  so  forth,  a  process  which 
approaches  but  cannot  exceed  n  proper¬ 
ties  since  N(C)  cannot  be  greater  than  2*” 
as  was  proved  in  Section  II. 

The  present  analysis  of  pattern  learn¬ 
ing  behavior  in  terms  of  properties  also 
differs  from  our  earlier  treatment  of  enu¬ 
merated  classes  secondarily  in  the  makeup 
of  S.  In  the  earlier  treatment  we  dealt 
with  a  class  E  of  equisignificant  patterns 
which  was  included  in  S  but  made  no  com¬ 
ment  about  those  frames  in  S  which  were 
not  in  E  other  than  to  require  that  E  be 
well-defined.  In  the  context  of  the  pres¬ 
ent  discussion,  on  the  other  hand,  E 
would  be  but  one  of  several  classes  (the 
Ct)  defined  by  the  properties  of  its  mem¬ 
bers  with  respect  to  which  they  are  equi¬ 
significant.  Taken  collectively  the  C, 
exhaust  S ;  reference  to  the  list  at  the  left 
will  verify  that 

Ci  yj  Ci5_;  =  S, 

and  in  general  it  can  be  shown  that 

Ci  yj  Ci't-i-i  =  S 

for  k  properties,  provided  the  listing  is 
carried  out  as  illustrated  at  the  left. 
Expressions  of  the  form 

Ci  yj  cm-t  =  s 

also  refer  to  “self-dichotomizing”  behavior 
in  terms  of  a  single  property  p  (&>  =»  3), 


and  by  implication  in  terms  of  additional 
properties  q,r,...;  that  is,  partitioning  of 
response  behavior  without  regard  to  the 
sign  of  the  output  binary  response. 

Both  the  earlier  and  the  present  ap¬ 
proach  have  shown  that  a  large  plurality 
of  possible  classifications  exist  potentially 
in  addition  to  that  one  to  which  the  experi¬ 
menter  would  have  his  random  learning 
machine  adapt  itself,  and  hence  that  adap¬ 
tation  is  much  more  likely  than  not  to 
occur  with  respect  to  some  undesired 
pattern  category  unless  he  actively  inter¬ 
feres  with  the  process.  Further,  in  the 
case  of  external  reinforcement  of  desired 
adaptive  behavior  and  inhibition  of  unde¬ 
sired  behavior  these  analyses  suggest  that 
the  experimenter  may  be  forced  to  wait 
an  excessively  long  time  for  favorable 
responses  to  appear  if  the  machine  is 
somehow  modified  by  chance  between 
trials.  We  will  therefore  find  it  profitable 
to  examine  some  simple  constraints  upon 
an  otherwise  random  automaton  which 
favor  machine  classification  of  patterns  in 
terms  of  distance  measure. 


IV  Mappings 
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Whether  a  pattern-recognizing  learning 
automaton  is  operated  in  a  spontaneous- 
learning  mode  or  is  reinforced  and  inhibited 
by  a  human  operator  who  decides  whether 
its  performance  at  each  trial  is  acceptable 
or  not,  a  considerable  gain  in  effectiveness 
may  be  obtained  if  machine  organization 
is  arranged  to  take  advantage  of  order-  and 
metric-derived  properties  of  patterns  such 
as  those  listed  on  page  8.  One  would 
then  expect  that  time  wasted  by  undesir¬ 
able  responses  could  be  reduced  substan¬ 
tially.  In  fact,  an  important  topic  beyond 
the  scope  of  this  paper  has  to  do  with 
optimum  degrees  of  organization  initially 
designed  into  learning  machines;  by  an¬ 
alogy  with  biological  organisms  there  is 
reason  to  suspect  that  considerable  initial 
organization  of  a  functionally  hierarchical 
kind  has  strong  advantages,  and  many 
learning  machines  have  been  designed  with 
this  in  view.  In  the  brief  discussion  which 
follows,  however,  we  will  take  but  a  short 
step  from  the  random  automata  treated 
earlier,  and  will  examine  some  aspects  of 
including  a  distance-measuring  capability 
in  terms  of  which  metrically  derived  prop¬ 
erties  of  patterns  can  have  machine  corre¬ 
lates.  We  therefore  abandon  random 
frames  to  consider  only  those  in  which 
arbitrary  transposition  of  frame  bits  prior 
to  learning  activity  is  not  admissible.  Our 
restrictions  upon  frame  bit  interchangea¬ 
bility  will  be  stated  in  terms  of  two  simple 
and  rather  minimal  constraints  upon  effects 
produced  by  simultaneously  stimulated 
frame  bits  in  terms  of  their  mutual  spac¬ 
ing,  and  upon  effects  produced  by  acti¬ 
vated  machine  elements  as  a  function  of 
physical  distances  between  them.  We  will 
then  be  in  a  position  to  discuss  preferred 
mappings  or  connection  paths  from  input 
frame  to  a  set  of  machine  elements. 
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The  first  constraint  (C 0  to  be  adopted 
is  that  pairs  of  frame  bits  simultaneously 
stimulated  shall,  other  things  being  equal, 
tend  to  have  more  influence  upon  machine 
organization  and  behavior  the  closer  to¬ 
gether  they  lie  in  the  input  frame,  and 
equal  influences  for  equal  separations.  By 
“tend  to  have”  we  shall  mean  that  C\  is  a 
constraint  upon  an  otherwise  random 
probability  distribution  with  respect  to 
location  of  bits  in  the  input  frame.  One 
would  expect,  in  a  not  completely  homo¬ 
geneous  machine,  that  some  of  its  parts 
will  have  greater  effect  upon  its  behavior 
and  organization  than  others;  Cx  merely 
requires  us  to  associate  such  parts  or  ele¬ 
ments  more  frequently  with  close  frame 
bits  than  with  more  widely  separated  ones 
according  to  some  properly  monotonic 
inverse  function  of  their  separation,  for 
the  avowed  purpose  of  emphasizing  organi¬ 
zation  of  machine  functional  substructures 
in  response  to  adjacent  and  quite  close 
pattern  bits.  These  can  then  (a  topic  not 
pursued  in  this  paper)  serve  as  building 
blocks  for  recognizing  patterns  designed 
according  to  metrically-important  schemes, 
a  category  including  practically  all  pat¬ 
terns  for  which  machine-learned  recog¬ 
nition  is  desirable. 

Since  we  deal  with  pairs  x,  y  of  input 
frame  bits,  and  have  been  speaking  of 
tendencies  toward  organizational  changes 
rather  than  rigidly  determined  ones  (al¬ 
though  this  may  be  required  merely  by 
our  incomplete  knowledge  of  the  machine’s 
structure),  constraint  Cx  lends  itself  to 
easy  expression  in  terms  of  conditional 
probabilities.  Let  f(x,  t )  be  some  organiza¬ 
tional  change  which  may  occur  at  the  t‘h 
learning  trial  depending  upon  the  value, 
1  or  0,  of  x,  and  let  p/(x,  t)  be  the  probabil¬ 
ity  that  f(x,  t)  actually  does  occur  for 


x  =  1.  (We  omit  consideration  of  x  =  0 
except  to  note  that  the  functional  forms 
in  t  of  /( 1,  t)  and  /( 0,  t)  need  not  be  simply 
related  to  each  other  and  assume  merely 
that  they  are  not  so  antagonistic  as  to  viti¬ 
ate  Cx  over  many  trials.)  Then  before  Cx 
is  imposed  p/(x,  1)  =  pf(y,  1)  for  all  x,  y 
since  nothing  can  be  said  a  priori  about 
preferential  effects  of  stimulated  bits  in 
the  input  frame  of  a  random  automaton  at 
the  first  trial.  If  we  then  assume  as  in 
Section  III  that  successive  patterns  as 
t  =  1,  2,  3,  . . .  have  the  same  proportion 
of  l’s  and  0’s  and  that  overlaps  are  random 
as  well, 

Ptix,  t)  =  p,{y,  t),  and 

P/(*  I  V )  =  t)\y  =  1), 

the  probability  that  f(x,  t)  will  occur  at 
the  ("■  trial  given  y  =  1,  is  symmetric  in 
x  and  y  and  quite  independent  of  the  loca¬ 
tion  (x  y)  of  these  two  stimulated  input 
bits.  Now  Ci  merely  implies  that  pf(x  \  y) 
will  increase  as  x  and  y  draw  closer  to¬ 
gether: 

dc{x,  y)  <  de(w,  z )  implies 

Pf{x  |  y)  >  pj(w  |  z)  (C,) 

where  equalities  and  inequalities  hold  to¬ 
gether,  x  may  or  may  not  be  identified 
with  w,  and  d,  is  distance  in  a  Euclidean 
space  E2  of  two  dimensions  (the  input 
frame). 

Of  all  ways  of  implementing  this  re¬ 
quirement  perhaps  that  involving  the  least 
stringent  assumptions  a  priori  as  to  struc¬ 
tural  configuration  can  be  made  to  follow 
from  a  similar  constraint  (Cj)  upon  ma¬ 
chine  elements  themselves:  pairs  of  such 
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elements,  when  simultaneously  stimu¬ 
lated  by  signals  from  the  input  frame,  shall 
tend  to  have  more  influence  upon  machine 
organization  the  closer  together  they  lie 
in  the  machine,  and  equal  separations 
shall,  by  and  large,  imply  equal  influences. 
Thus 

d.{x,  y)  <  de(w,  z)  implies 

p/(*  I  y)  >  P/(w  1 *)  (C2) 

where  equalities  and  inequalities  hold  to¬ 
gether,  x,  y,  tv,  z  are  machine  elements 
connected  to  the  input  frame  and  x  may 
or  may  not  be  identified  with  tv,  /(*,  t)  is 
some  function  describing  a  change  in  ma¬ 
chine  organization  which  may  occur  for 
x  =  1,  that  is,  *  receives  a  stimulus  from 
some  x  =  1  in  the  input  frame,  and 
p/( x  |  y)  is  the  probability  of  its  occurrence 
given  y  =  1,  p,(x  |  y)  =  pf(y  |  *)  for  all 
machine  elements  x,  y  connected  to  the 
input  frame,  and  dt  is  Euclidean  distance 
in  three  dimensions  (a  physically  realized 
machine  occupies  a  volume  in  space  El). 

The  constraints  have  been  stated  with¬ 
out  reference  to  particular  automata.  In 
the  case  of  the  conditional  probability 
pattern  learning  machine  of  A.  M.  Uttley 
(1959),  Ci  would  require  the  coincidence  of 
x  =  1  with  y  =  1  to  be  given  greater 
weight,  in  a  single  “and”  component  even¬ 
tually  accepting  inputs  from  both  x  and  y, 
the  closer  together  x  and  y  lie  in  the  input 
raster  of  photocells.  In  the  “perceptron” 
proposed  by  F.  Rosenblatt,  a  similar  com¬ 
ment  would  apply  to  retinal  input  units 
and  common  association  units,  and  C2 
would  imply  that  such  a  requirement 
could  perhaps  be  implemented  by  specify¬ 
ing  that  any  otherwise  random  intercon¬ 
nections  between  association  units  be 


more  likely  to  occur  the  closer  such  units 
lie  to  each  other  in  the  machine  proper. 
In  any  event,  in  this  paper  C2  is  not  de¬ 
pendent  upon  Ci  but  is  a  distinct  and 
separate  constraint,  and  is  adopted  here 
to  avoid  commitment  to  some  particular 
and  possibly  rather  highly  structured 
machine  organization  complying  with  Ci. 

We  have  made  no  comment  as  to  the 
number  of  machine  elements  compared 
with  the  number  of  input  frame  bits,  nor 
about  what  kind  of  connection  schemes 
are  to  be  preferred  in  mapping  the  latter 
on  to  the  former.  The  existence  of  Ci  and 
C2  implies  some  rather  specific  statements 
about  such  matters  but  these  further  re¬ 
strictions  can  be  made  to  follow  deduc¬ 
tively  from  C,  and  C2  together  with  a  few 
rather  obvious  assumptions. 

Let  X  be  the  set  of  input  frame  bits 
x,  y,  . . .,  z,  IV ex')  =_n,  and  y/n  an  in¬ 
teger  for  a  y/n  by  \/n  array  in  E7.  X  is 
thus  a  finite  metric  space.  Then  (Lemma 
1)  for  n  >  4,  for  every  pair  x,  y  in  X  there 
is  at  least  one  z  in  X  such  that  d€(x,  z)  ^ 
dt{y,  z).  Proof:  since  X  is  a  Euclidean 
2-space  the  locus  of  points  equidistant 
from  x  and  y  is  a  straight  line  linearly 
relating  the  two  components  of  any  ele¬ 
ments  in  X  which  lie  upon  it.  There  can 
be  no  more  than  y/n  such  points,  and  at 
least  n  —  y/n  —  2  remain  other  than 
x  and  y. 

We  will  denote  by  M  the  set  of  machine 
elements  and  by  X  that  subset  of  them 
connected  to  the  input  frame  as  specified 
by  some  mapping  <p  of  X  on  to  X,  X  C  M. 
It  is  assumed  that  n  >  4  in  all  that  follows, 
and  for  the  moment  we  shall  exclude  any 
one-many  “mappings”  of  X  on  to  X. 
Then  — 

THEOREM  1 :  X  has  at  least  n  elements. 
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Proof:  if  not,  at  least  two  members  x,  y 
of  X  map  on  to  one  x  in  X.  By  the  lemma 
we  can  find  a  z  in  X  not  equidistant  from  x 
and  y,  and  which  maps  into  X  by  defini¬ 
tion.  Then  either  ipz  =  x  or  not.  If  so, 
three  points  map  on  to  one,  and  alter¬ 
native  stimulations  by  differently  sepa¬ 
rated  pairs  (x,  y)  and  (x,  z)  exert  indis¬ 
tinguishable  effects,  contradicting  Ci.  If 
ipz  =  z  ^  x  then  <p(x,  z)  =  <fi(y,  z)  =  (x,  z) 
and  a  similar  comment  applies.  Hence 
N(X)  is  not  less  than  n. 

COROLLARY:  M  has  at  least  n  ele¬ 
ments. 

COROLLARY :  No  <?  is  many-one. 

COROLLARY :  No  *  in  X  has  more  than 
one  image  in  X. 

COROLLARY:  <p  is  one-one,  since  one- 
many  “mappings”  are  excluded  arbi¬ 
trarily. 

COROLLARY :  X  has  exactly  n  elements. 

LEMMA  2:  Since  ^  is  one-one,  dc(x,  y)  < 
d'{x,  z)  implies  de(x,  y )  <  d,(x,  z),  equali¬ 
ties  and  inequalities  holding  together. 
Proof:  this  lemma  follows  directly  from 
C  ar.d  C  .  Since  <p(x,  y,  z)  =  (x,  y,  z ) 
uniquely,  the  only  way  closer  points  in  X 
can  influence  the  machine  to  a  greater 
extent  than  more  distant  points  is  through 
mapping  respectively  in  to  closer  and 
more  distant  machine  elements.  Im¬ 
portant  further  consequences  of  this  lemma 
will  be  discussed  later. 

THEOREM  2:  <pX  =  X  maps  closest 
points  into  closest  points.  Proof:  let  x,  y 
be  closest:  then  of  course  there  is  no  point 


closer  than  x  to  y  in  X.  Suppose  a  w 
exists  in  X  such  that  de( y,  w)  <  de(y,  x). 
Its  image  w  in  X  can  then  not  be  closer 
than  x  to  y.  CASE  I:  de{y,  w )  >  de(y,  x). 
This  cannot  imply  de{ y,  w)  <  de(y,  x)  by 
Lemma  2.  Thus  de(y,  w )  <  de(y,  x)  ap¬ 
parently  remain  as  possibilities.  CASE  II: 
de{y,  w)  =  de(y,  x).  This  must  imply 
d  {y,  u»)  =  de(y,  x)  by  Lemma  2  and 
hence  cannot  imply  de(y,  m>)  <  de(y,  x). 
CASE  III:  dc(y,  w)  <  de(y,  x).  This,  the 
only  case  which  can  imply  de( y,  w )  < 
d'(y,  x),  is  explicitly  forbidden  by  hypo¬ 
thesis.  Hence  x,  y  are  closest  points  in  X. 

COROLLARY :  If  x,y,...,z  are  closest 
to  to  in  I  then  <p(w,  x,  y,  .  .  z)  = 

( w ,  x,  y,  .  .  .,  z)  in  X  such  that  x,y,...,z 
and  x,  y,  .  .  .,  z  are  respectively  equi¬ 
distant  from  w  and  w. 

This  theorem  and  its  corollary  form  a 
finite-space  analogy  to  the  mapping  of 
neighborhoods  on  to  neighborhoods  in 
infinite  spaces  with  completely  ordered 
distances,  and  hence  suggest  the  notion  of 
continuous  mappings.  A  moment’s  reflec¬ 
tion  will  show,  however,  that  the  analogy 
is  far  from  perfect  in  the  present  context, 
and  that  many  properties  of  topologically- 
invariant  mappings  on  infinite  spaces  do 
not  hold.  For  example,  we  must  not  allow 
excessive  bending  or  other  "continuous” 
distortions  which  would  violate  Lemma  2 
by  upsetting  the  order  of  any  triplets. 
In  the  case  of  infinite  spaces  continuous 
mappings  admit  violent  deformations  such 
as  stretching,  warping,  squeezing,  and  so 
on,  and  in  fact  all  distortions  not  involv¬ 
ing  tearing  or  joining,  because  neighbor¬ 
hoods  and  their  maps  may  be  arbitrarily 
small.  But  where,  as  in  this  discussion, 
there  is  a  lower  bound  to  the  size  of  a 
"neighborhood”  (a  set  of  closest  points  in 
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a  finite  metric  space)  the  permissible  dis¬ 
tortions  generable  by  <p  may  well  be  so 
gradual  as  to  be  almost  imperceptible, 
especially  when  Lemma  2  is  kept  in  mind. 
Were  we  certain  that  X  itself  formed  a 
(perhaps  warped)  surface  in  E 3  of  some 
useful  kind,  the  question  would  be  some¬ 
what  clarified,  but  this  remains  to  be 
shown. 

DEFINITION:  A  finite  set  of  points  in 
M  is  a  “surface”  if  the  set  is  embedded 
in  a  bounded  infinite  set  which  is  a  two- 
dimensional  space  lying  within  a  volume 
in  E -1  also  containing  M. 

This  definition,  while  intuitively  reason¬ 
able,  is  not  alone  sufficient  to  establish 
useful  conclusions  about  surfaces  in  M 
containing  X.  To  illustrate,  since  N(X)  = 
n  <  oo  we  can  always  pass  m  <  n  parallel 
planes  through  M  in  such  a  way  that  any 
member  of  X  lies  on  one  of  them.  Alterna¬ 
tive  joining  of  adjacent  planes  on  opposite 
sides  of  the  volume  containing  M  then  gen¬ 
erates  one  folded  planar  surface  containing 
X.  Thus,  while  X  always  lies  in  some 
surface  in  M,  we  shall  be  more  interested 
in  those  bearing  more  useful  relationships 
to  the  E-  containing  X.  For  example, 
it  might  be  of  some  interest  to  consider 
surfaces  on  which  closest  points  in  X  map 
as  closest  points,  distances  on  the  surface 
being  measured  along  geodesics,  or  short¬ 
est  curves,  between  points.  Such  an  ap¬ 
proach  would  not  only  involve  us  in  rather 
formidable  geometric  difficulties,  how¬ 
ever,  but  would  make  use  only  indirectly 
if  at  all  of  one  of  our  basic  assumptions  — 
that  distances  in  M  are  to  be  measured  by 
the  Euclidean  metric  in  three  dimensions. 
We  shall  therefore  attempt  to  derive 
characteristics  of  surfaces  containing  X 


which  are  “simplest”  in  some  reasonable 
sense  by  building  upon  properties  already 
assumed  or  deduced  and  will  start  this 
process  with  consideration  of  lines  and 
curves  in  X  and  X.  It  is  clear  that  the 
notion  of  a  “curve”  in  X  or  X  is  related 
to  that  of  a  surface,  but  is  of  dimension¬ 
ality  one  less.  We  shall  confine  our 
attention  to  curves  built  up  of  straight- 
line  segments  joining  adjacent  points  in  X, 
or  in  X,  or,  almost  equivalently,  as  con¬ 
sisting  of  sequences  of  adjoining  bits  in  X 
or  in  X. 

DEFINITION :  A  monotonic  curve  in  X 
is  a  sequence  (x,-)  in  X,  i  =  1,  2,  3,  .  . .,  I, 
for  which  dc{x,-,  x,)  <  de{xit  xk)  if  and 
only  if  1  <  i  <  j  <  k  <  l,  where  there  is 
no  xm  such  that  dc{xit  xm )  <  de(xit  Xj)  or 
de(Xj,  xm)  <  dt(xit  xk),  and  a  monotonic 
curve  in  X  is  described  as  above  by  print¬ 
ing  all  subscripted  literals  in  bold  face. 

THEOREM  3:  If  (x,)  is  a  monotonic 
curve  in  X  then  y?(a ;,■)  =  (*,•)  is  a  mono¬ 
tonic  curve  in  X.  Proof  directly  from 
Lemma  2. 

COROLLARY :  Every  straight  line  in  X 
maps  on  to  a  monotonic  curve  in  X. 

With  this  corollary  we  have  a  clue  to 
constructing  an  appropriate  “simplest  sur¬ 
face”  S  in  M  containing  X.  For  let  the 
input  frame  be  described  as  a  y/n  by 
y/n  grid  of  2 y/n  straight  lines  at  the 
intersections  of  which  are  the  frame  bits 
x,„  l  <  i,  j  <  y/n,  themselves.  Then 
these  grid  lines  map  on  to  a  family  of  2 y/n 
monotonic  curves  in  M  whose  intersections 
at  x ij  preserve  the  closest-point  relation¬ 
ship  derived  in  Theorem  2.  It  is  easy  to 
see  that  these  mapped  grid-lines  should 
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not  cross  an  odd  number  of  times  except 
at  the  Xu,  or  otherwise  become  tangled, 
lest  the  ordering  so  necessary  for  the 
monotonic  property  be  disturbed.  If  we 
imagine  a  soap  film  stretched  over  the 
interstices  of  the  grid  in  M  a  reasonably 
vivid  model  of  a  “simplest  surface”  con¬ 
taining  X  results. 

If  we  consider  the  sequence  of  sets  of 
points  closest  to  some  one  in  X,  next 
closest,  closest  but  two,  and  so  on,  Theo¬ 
rem  2  and  Lemma  2  are  satisfied  if  5  is  a 
plane  or  a  hemisphere  in  E3  but  not  other¬ 
wise.  But  the  closest  points-into-closest 
points  relationship  must  hold  for  all  xh  in 
X,  Xij  in  X.  Then  since  a  sphere  cannot  be 
tiled  by  “squares”  whose  sides  are  great 
circle  segments  (and  equal  chords  con¬ 
necting  corners  to  centers,  our  distances 
in  E3)  it  follows  necessarily  that  S  must 
be  a  plane,  and  that  the  monotonic  curves 
of  the  last  corollary  are  in  fact  straight 
lines  in  S.  This  plane  may  be  larger  or 
smaller  than  the  input  frame;  with  respect 
to  some  fixed  reference  in  E3  it  may  be 
found  arbitrarily  translated ;  and  of  course 
it  can  have  any  rotational  orientation 
about  a  translated  point.  The  only  admis¬ 
sible  mappings  <e,  in  other  words,  are  the 
Euclidean  similarity  transformations  in 
three  dimensions,  and  no  scale  changes  may 
occur  from  one  part  of  the  map  to  another. 

With  the  preceding  comments  we 
thoroughly  exhaust  one-one  mappings 
consistent  with  Ci  and  C2.  These  may  be 
succinctly  characterized  as  plane-into- 
plane,  similar,  and  hence  order-preserving 
in  terms  of  the  metrics  applicable  to  image 
and  map.  We  shall  conclude  this  discus¬ 
sion  of  the  mapping  problem  by  examining 
some  consequences  of  the  question:  is  the 
class  of  allowable  mappings  augmented 
by  limiting  the  distance  in  M  over  which 


C2  is  assumed  effective?  In  other  words, 
what  are  the  consequences  of  assuming 
that  x  and  y  have  no  effect  upon  machine 
organization  as  pairs  (over  that  which  they 
could  be  expected  to  exert  merely  as  two 
individual  elements)  if  they  are  more 
widely  separated  than  some  limiting  dis¬ 
tance  r  in  Ml  More  formally,  we  modify 
C2  as  follows: 


dt{x,  y)  <  de(w,  z)  <  r  implies 


P/(*  I  V)  >  P/(w  |  z), 
otherwise 


(C2.i) 


p/(*  I  y)  =  p/(w  |  z) 


=  p/(a,  t),  a  =  w,  x,  y,  or  z 


Clearly  all  that  has  been  said  before 
continues  to  hold  for  every  subvolume 
in  E3  containing  portions  of  M,  which  is 
less  than  r  in  its  longest  dimension,  except 
that  the  image  of  any  corresponding  por¬ 
tion  of  X  will  be  bounded  by  some  closed 
curve  cr  in  X  (perhaps  including  the  bound¬ 
ary  of  the  input  frame). 

The  effect  of  CiA,  in  other  words,  is  to 
localize  the  restrictions  upon  <p  to  regions 
of  X  in  such  a  way  that  portions  of  the 
map  more  than  r  units  away  from  each 
other  have  no  direct  mutual  dependency 
but  are  only  related  by  virtue  of  interven¬ 
ing  overlapping  regions.  But  the  class  of 
mappings  v  remains  as  before  because  of 
the  overlapping  unless  r  is  impractically 
small.  The  adoption  of  C2.,  in  place  of  C2 
implies  a  corresponding  and  obvious  modi¬ 
fication  of  Ci;  by  Ci.i  we  shall  mean  that 
the  original  constraint  holds  only  within 
each  cr.  This  ends  the  formal  develop¬ 
ment. 
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In  terms  of  mechanism  C2 .!  has  a  certain 
reasonable  appeal:  in  an  actual  pattern¬ 
learning  machine  we  can  readily  imagine 
that  p/(x  |  y)  should  be  a  much  more 
sensitive  function  to  closest  x,  y  than  even 
to  those  twice  as  widely  separated ;  clearly 
one  of  the  most  important  metric  proper¬ 
ties  of  useful  patterns  to  which  we  would 
desire  early  establishment  of  some  machine 
correlate  is  adjacency  of  pattern  bits.  It 
thus  would  appear  natural  that  p,(x  |  y) 
fall  off  quite  steeply  as  d,(x,  y)  increases 
from  its  minimum  non-zero  value.  An 
exponential  or  similar  functional  form 
would  seem  appropriate.  In  such  a  case, 
as  separation  increases  there  must  come  a 
point  at  which  p,{x  |  y)  hardly  differs 
from  the  sum  of  the  independent  effects 
of  *  =  1  and  y  =  1  separately,  and  CiA 
merely  assumes  that  this  distance  is  signi¬ 
ficantly  smaller  than  the  dimensions  of 
the  volume  enclosing  the  machine  elements 
themselves.  If  r  is  very  much  less  than 
these  dimensions  X  is  still  a  plane  in  M, 
being  built  up  by  connected  and  over¬ 
lapping  small  regions  each  of  which  is  a 
portion  of  a  plane. 

The  preceding  derivation  of  properties 
of  one-one  mappings  consistent  with  C,.i 
and  C2.i  in  an  over-idealized  machine  has 
consequences  of  some  significance  due  to 
the  very  generality  of  the  assumptions 
used.  This  specificity  of  allowable  ma¬ 
chine  structures  and  maps  really  follows 
from  the  distance  ordering  and  other  prop¬ 
erties  of  metric  spaces  in  three  dimensions. 
We  inhabit  such  a  space  ( E 3  in  the  small) 
and  our  machines  exist  in  it  as  well.  That 
the  mild  constraints  adopted  should  neces¬ 
sarily  lead  to  a  more  than  merely  “continu¬ 
ous”  mapping  of  a  visual  field  on  to  a 
specialized  plane  surface  in  a  volume  of 
essentially  undifferentiated  elements  is 


perhaps  not  intuitively  obvious  and,  since 
animals  have  evolved  in  this  same  space, 
might  be  expected  to  illuminate  certain 
facts  of  neuroanatomy  whose  relationship 
has  been  obscure.  We  will  return  to  this 
point  in  the  next  Section  after  a  short  and 
informal  discussion  of  special  one-many 
“mappings”  of  X  on  to  M. 

Under  the  assumptions  adopted  for  the 
derivation  of  Theorem  1,  many-one  map¬ 
pings  of  any  sort  are  prohibited.  In  the 
case  of  the  specialized  class  of  plane  sur¬ 
faces  in  E3  an  input  frame  maps  by 
similarity  transformation  on  to  one  of  the 
planes  only.  This  relationship  was  derived 
as  a  consequence  of  limiting  connections 
to  one-one  mapping  rather  than  one-many. 
We  now  consider  some  one-many  connec¬ 
tion  modes  consistent  with  earlier  develop¬ 
ments. 

As  a  first  case  let  us  modify  our  point 
of  view  of  the  machines  previously  dis¬ 
cussed  by  regarding  the  members  of  X  pri¬ 
marily  as  terminations  of  connections 
from  X.  Then  certainly  other  elements 
in  M  are  influenced  by  stimuli  arriving  at 
the  x's  since  C2  or  C2.i  can  reasonably 
apply  to  all  members  of  M  if  we  amplify 
the  statement  of  these  constraints  to  in¬ 
clude  stimuli  from  elements  in  X  as  well. 
If  other  elements  in  M  are  much  larger 
in  number  than  the  x’s  (and  hence  of 
N(X))  we  have  a  quasi  one-many  “map¬ 
ping”  of  X  on  to  M.  This  would  become 
explicit  were  C2  or  C2.i  to  be  implemented 
by  specifying  that  interconnections  be¬ 
tween  members  of  M  (including  those  of 
X)  tend  to  fall  off  monotonically  with 
distance  in  E3,  the  mapping  being  then  on 
to  those  elements  in  M  which  are  not  in  X 
but  which  are  connected  to  elements  in  X. 
If  the  inverse  distance  function  is  invari¬ 
ant  in  form  with  respect  to  direction  in 
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E 3  and  hence  in  X  we  may  speak  of  a 
unimodal  two-dimensional  probability 
distribution  of  connections  from  an  ele¬ 
ment  x,  defined  in  a  plane  in  M  parallel 
to  X  (or  perhaps  containing  X)  which  is 
maximum  closest  to  x. 

A  second  variety  of  one-many  connec¬ 
tions  corresponds  to  adoption  of  C i and 
C2. i  only.  Within  the  volume  containing 
M  we  may  specify  more  than  one  plane 
“simplest  surface”  as  previously  discussed, 
provided  no  two  points  on  different  sur¬ 
faces  lie  less  than  r  units  of  distance  from 
one  another.  For  k  such  surfaces  we  have 
k  one-one  mappings  from  X  on  to  Xu  X->, 
.  .  .,  Xk  in  M  or,  equivalently  one 
one-many  “mapping”  from  X  on  to 

X  =  KJ  X,-.  Combinations  of  these  two 

i-i 

types  of  mappings  offer  a  third,  mixed 
category. 

Similarity  transformations  do  not  per¬ 
mit  changes  of  scale  on  the  map.  Hence 
if  it  is  desired  to  obtain  finer  resolution  in 
the  center  of  the  input  frame  than  at  the 
periphery,  as  would  be  appropriate  if  the 
input  frame  were  to  scan  the  visual  field 
instead  of  passively  receiving  information 
from  it,  we  must  effect  a  preliminary  trans¬ 
formation  i  of  the  visual  field  on  to  the 
input  frame.  Two  general  kinds  of  tech¬ 
niques  can  be  conceived  to  accomplish 
this.  A  nonlinear  optical  system  might 
focus  the  visual  field  on  to  the  input  frame 
giving  the  desired  enhancement  at  the 
center.  Here  the  formal  statements  of  C, 
and  C,  i  remain  unaltered  but  refer  to  a 
visual  field  distorted  by  ^-l. 

The  second  solution  would  crowd  the 
input  frame  bits  to  a  compact  cluster  in 
the  center  and  space  them  more  widely 
toward  the  edges  or,  alternatively,  effect 


the  same  transformation  4>~l  at  an  inter¬ 
mediate  relay  location  between  input 
frame  and  machine  proper.  In  either  of 
these  cases  or  in  both  combined  the 
Euclidean  metric  dc  is  no  longer  applicable 
to  the  input  frame  and  Ci  and  Cm  must  be 
altered  accordingly.  In  terms  of  the  new 
metric  dl  all  of  our  previous  results  con¬ 
tinue  to  hold  provided  d1  is  substituted 
for  dc  wherever  the  latter  appears.  The 
visual  field  then  maps  on  to  X  by  the 
transformation  \p<p  as  a  moment’s  reflec¬ 
tion  will  show  is  also  the  case  for  the  optical 
solution.  Geodesics  in  the  map  gener¬ 
ated  by  \p  on  the  input  frame  will  in  general 
no  longer  be  straight  lines  in  E-,  although 
special  solutions  may  exist  where  this  is 
true.  It  is  essential  in  the  foregoing  that 
be  order-preserving  in  point  triples  with 
respect  to  de  and  d\  and  it  would  be  desir¬ 
able  for  local  departures  of  d1  from  de  to 
be  minimal  so  that  small  areas  of  the  visual 
field  are  not  too  distorted  in  their  repre¬ 
sentation  upon  the  plane  containing  X. 
As  we  shall  see,  these  considerations  sug¬ 
gest  a  viewpoint  from  which  to  examine 
mammalian  visual  system  neuroanatomy. 
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V  Visual  System 

Nsuroanatomy  In  Mammals 


It  was  suggested  in  Section  IV  that  since 
animals  have  evolved  in  a  metric  space, 
the  ordinary  Euclidean  world  of  three 
spatial  dimensions,  some  of  our  earlier 
observations  may  cast  light  on  hitherto 
obscure  relationships  between  facts  known 
to  neurophysiologists  and  neuroanato¬ 
mists.  Certainly  we  would  not  expect 
clear  contradictions  between  such  facts 
and  the  generalized  model  under  discus¬ 
sion  if  common  features  can  be  found 
characteristic  of  both. 

Constraint  C,  is  reasonable  to  assume 
as  important  in  early  visual  learning  in 
mammals.  Considerable  data  exist  to 
show  catastrophic  consequences  of  early 
deprivation  of  visual  experience,  notably 
von  Senden’s  work  with  cataract  patients 
who  received  their  sight  for  the  first  time 
as  adults  (Hebb  1949).  These  people  re¬ 
quired  many  months,  on  the  average,  of 
patient  coaching  in  order  to  learn  to 
recognize  simple  geometric  figures  in  vari¬ 
ous  orientations.  This  research  is  signifi¬ 
cant  in  view  of  the  fact  that  formation  of 
myelin  sheaths  on  axons  in  the  visual 
(striate)  cortex,  primarily  those  in  the 
outer  pial  surface  and  the  inner  white 
matter,  occurs  in  humans  early  in  infancy 
followed  by  myelination  in  the  surround¬ 
ing  peristriate  cortex  (Conel;  Flechsig). 
Dendritic  growth  and  ramification  in  the 
human  cerebral  cortex  is  especially  rapid 
in  the  weeks  immediately  following  birth 
and  again  occurs  in  the  striate  area  earlier 
than  in  the  parastriate  (Conel).  In  the 
primate  occipital  lobes  Von  Bonin,  Garol, 
and  McCulloch  have  shown  (1942)  by 
strychninization  techniques  that  the  stri¬ 
ate  transmits  stimuli  to  the  parastriate 
cortex  only,  except  for  slight  edge  effects 
at  the  boundary  between  these  two  areas, 
and  that  the  striate  cortex  in  turn  is  the 
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only  occipital  region  to  show  prompt  time- 
locked  evoked  potentials  in  response  to 
light  flashed  in  the  eyes  of  the  anesthe¬ 
tized  animal.  There  is  a  paucity  of  associ¬ 
ation  fibers  in  the  visual  cortex,  there 
being  no  association  fibers  which  are  found 
to  extend  more  than  five  millimeters  from 
a  lesion  to  adjacent  parts  and  no  further 
elsewhere  than  to  the  parastriate  area 
(Clark).  The  layout  of  the  striate  area 
thus  emphasizes  formation  of  associational 
connections  between  regions  within  it 
which  are  close  to  one  another.  These 
studies  together  indicate  that  the  parastri¬ 
ate  functionally  builds  upon  the  more 
fundamental  striate  area.  The  formation 
of  myelin  insulating  sheaths  is  thought 
to  have  something  to  do  with  increasingly 
specific  neural  organization,  and  growth 
of  dendrite  trees  certainly  does.  It  appears 
that  if  elementary  visual  learning  is  not 
accomplished  at  the  proper  time  in  a 
human  organism’s  growth  cycle  it  can  be 
done  later  with  great  difficulty  if  it  can  be 
done  at  all.  The  possible  role  of  optic 
tissue  degeneration  in  humans  is  unfortu¬ 
nately  obscure;  no  anatomical  studies  have 
been  made  on  congenitally  blind  infants 
(Mendelson).  Other  studies  have  shown 
that  cats  and  chimpanzees  reared  in  dark¬ 
ness  grow  up  to  be  quite  disorganized 
animals  (Riesen).  While  these  data  may 
not  definitely  imply  some  physiological 
and  behavioral  analogue  of  Ci  or  Ct  .1  the 
evidence  seems  all  to  be  in  that  direction. 

Concerning  C2. 1,  the  English  neuro¬ 
anatomist  D.  A.  Sholl  has  published  find¬ 
ings  (1953)  on  connection  pathways  in  the 
cat  striate  cortex  which  show  a  marked 
exponential  falloff  with  distance  from  the 
perikaryon  (neuron  minus  axon  and  den¬ 
drite  fibers)  of  occurrence  of  a  neuron's 
dendrites,  the  local  connections  in  the 


cortex  apparently  being  otherwise  ran¬ 
dom.  Haddara  has  since  found  (1955) 
a  similar  situation  in  the  mouse.  Uttley 
(1955)  used  Sholl's  data  to  show  that  prob¬ 
ability  of  two  neurons  being  connected 
together  via  a  synaptic  junction  falls  ex¬ 
ponentially  with  increasing  separation  be¬ 
tween  them.  Bullock  has  recently  (1959) 
published  evidence  of  decrementally- 
spreading  interneuron  influences  in  simple 
ganglia  in  the  lobster,  propagating  in  a 
continuously  decreasing  manner  with  dis¬ 
tance  and  without  axon  spike  generation, 
and  hypothesizes  that  such  continuous  and 
graded  activity  may  be  important  in 
cortical  functioning.  The  branchings  of 
afferent  nerve  terminations  in  the  middle 
layers  of  the  cat  striate  cortex  occur  after 
the  afferent  loses  its  insulating  myelin 
sheath,  which  it  carried  while  entering 
through  the  lower  (inner)  layers,  and  show 
the  same  dichotomous  structure  in  Sholl’s 
diagrams  (1956)  as  for  dendritic  ramifi¬ 
cations  of  pyramidal  and  stellate  neurons 
occurring  in  the  same  layers  and  to  which 
exponential  falloff  applies,  although  he 
does  not  remark  on  this  similarity.  Sholl 
was  careful  to  restrict  his  observations 
and  measurements  to  flat  regions  of  the  cat 
striate  cortex  since  the  effect  of  curvature 
in  the  convolutions  was  not  clearly 
understood. 

More  recently  Bok  (1959)  in  the 
Netherlands  has  produced  evidence  to 
support  his  own  contention  that  nerve  cell 
shape  and  connection  patterns  are 
modified  in  the  sulci  and  gyri  in  a  way 
appropriate  to  nullify  the  effects  of  cortical 
curvature  in  these  regions,  and  states  his 
belief  that  the  cortex  is  essentially  a  flat 
organ.  That  such  alterations  to  counteract 
effects  of  curvature  are  functional  and  not 
accidental  or  due  merely  to  mechanical 
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crumpling  is  persuasively  argued  by  Bok’s 
further  observations  that  macroglial  cells 
and  capillary  patterns  remain  unaltered 
from  flat  to  curved  cortical  regions;  their 
presumed  respective  functions  of  provid¬ 
ing  a  mechanical  skeleton  and  draining 
fluid  require  a  constant  volumetric  density 
distribution  in  the  cortex. 

Visual  pathways  in  the  mammal  form  a 
highly  complex  system.  The  retina  itself 
must  be  regarded  as  an  extension  of  the 
brain  (Hartline;  Young)  and  much  early 
processing  of  incoming  information  occurs 
there,  such  as  sharpening  of  contrasts, 
emphasizing  changes  with  time,  and  some 
accommodation  to  average  intensity.  After 
leaving  the  eyeball  the  optic  nerve  bundle 
terminates  in  the  lateral  geniculate  body 
of  the  thalamus  from  which  cortical  affer¬ 
ent  nerve  fibers  radiate  to  the  striate 
cortex,  terminating  there  (in  the  cat)  in 
dichotomous  branchings  in  the  middle 
third  of  its  thickness  and  distributed  over 
its  surface  at  a  presumably  even  density 
of  about  25,000  fibers  per  mm2  (Sholl). 
Since  the  striate  region  is  at  once  the  thin¬ 
nest  (1-2  mm  in  cats,  about  1.6  mm  in 
man)  and  most  regular  part  of  the  entire 
cerebral  cortex  it  is  clear  that  the  afferent 
endings  lie  in  a  surface  very  thin  in  com¬ 
parison  to  its  area  (about  3,000  mm2  in 
man  (Sholl)),  and  from  the  work  of  Bok 
and  his  colleagues,  in  a  surface  which 
functionally  may  be  a  plane. 

There  is  no  one-one  mapping  of  nerve 
connections  from  retinal  input  cells  on  to 
striate  cortex.  Yet  point  light  stimuli  in 
the  left  or  right  half  of  the  visual  field 
seen  by  either  eye  of  the  mammal  map 
one-one  on  the  contralateral  (right  or  left 
respectively)  occipital  lobe,  the  site  of  the 
visual  cortex,  in  terms  of  the  center  of  the 
region  of  maximum  evoked  electrical  re¬ 


sponse.  This  response  field  for  a  fixed 
point  light  stimulus  falls  off  rapidly  as  the 
detecting  electrode  is  moved  away  from 
the  maximum  spot.  Visual  maps  plotted 
by  this  technique  as  well  as  by  nerve  de¬ 
generation  studies  (Krieg,  Polyak)  show 
disproportionately  large  areas  of  the  cortex 
serving  foveal  vision  with  compression 
towards  the  edges  of  the  striate  area  as  the 
stimulus  is  moved  toward  the  visual  pe¬ 
riphery  (Hebb,  1949).  In  the  chimpanzee 
this  “eye  map”  upon  each  occipital  lobe 
is  quite  accurately  expressed  as  a  distor¬ 
tion  of  a  semi-polar  grid  to  a  family  of 
semi-ellipses  compressing  logarithmically 
away  from  the  central  (foveal)  region, 
which  itself  is  larger  than  implied  by  the 
logarithmic  relation  alone  (Marshall,  Wool- 
sey,  and  Bard).  These  mappings  are  more 
than  merely  topologically  invariant.  We 
do  not  observe  widely  separated  points  of 
the  visual  field  mapping  on  to  points  of  the 
cortex  closer  together  than  the  maps  of 
intermediate  points  as  often  occurs,  for 
example,  in  boundary  regions  of  the  con¬ 
tinuous  conformal  mappings  of  function 
theory,  always  excepting  the  central  split 
into  two  maps  located  on  the  two  cerebral 
hemispheres.  They  are  order-preserving 
with  respect  to  triples  in  terms  of  the  geo¬ 
desics  of  image  and  map,  as  are  all  contin¬ 
uous  mappings  on  metric  spaces,  but  the 
extreme  distortions  allowed  by  continu¬ 
ity  alone  are  not  found  in  mammals,  pre¬ 
sumably  because  the  striate  cortex  is  in  a 
sense  a  finite  point  set  in  E3,  the  lower 
limit  to  neuron  spacing  being  set  by  the 
size  of  their  perikarya.  The  maps  appear 
to  be  an  excellent  natural  compromise 
between  the  organism's  conflicting  re¬ 
quirements  of  expanded  central  vision, 
necessarily  implying  scale  change,  and 
fairly  small  local  deviation  from  the 
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Euclidean  metric  of  the  world  it  inhabits. 

We  have  seen  that  a  reasonably  good 
case  can  be  made  for  correlates  in  mammals 
of  the  constraints  Ci.t  and  C2. t,  and  have 
noted  the  nature  of  the  order-preserving 
visual  mappings  found  in  these  animals  and 
in  man.  In  view  of  Nature’s  usual  parsi¬ 
mony  it  would  seem  that  some  functional 
reason  of  great  value  to  the  organism  must 
exist  for  such  highly  specific  mapping  of 
visual  data  on  to  a  sheet  of  randomly 
but  decrementally  connected  neurons, 
and  it  is  suggested  that  this  mode  of 
mapping  as  well  as  the  planar  form  of  the 
visual  cortex  itself  was  forced  upon  the 
evolutionary  process  by  the  prior  existence 
of  similar  nerve  cells  such  as  found  in  lower 
organisms  (analogous  to  C2.0,  by  the 
properties  of  Euclidean  space,  and  by  the 
evolving  mammal’s  need  to  organize  its 
perceptual  data  on  a  foundation  suited  to 
that  space  and  to  the  objects  in  it  (analo¬ 
gous  to  Ci  .0  which  the  animal  had  to  learn 
to  recognize  swiftly  and  surely  in  order  to 
survive  in  competition  with  organisms 
possessing  less  flexible  nervous  systems. 
Unfortunately  this  suggestion,  inferred 
from  the  development  in  Section  IV,  is 
not  open  to  objective  verification,  although 
it  seems  phylogenetically  persuasive. 

In  that  development  it  was  pointed 
out  that  change  in  map  scale  should  not  be 
obtained  by  varying  the  density  of  connec¬ 
tions  over  the  surface  of  machine  elements 
X,  and  that  any  local  scale  changes  had 
to  be  made  by  some  prior  transformation 
before  C2.t  with  its  dependence  upon  dc  in 
E3  entered  the  picture.  Insofar  as  the 
analogy  holds  for  mammals  this  transfor¬ 
mation  appears  to  be  performed  in  the 
retina  and  perhaps  in  subcortical  struc¬ 
tures,  the  latter  also  generating  extra  fibers 
to  spread  evenly  over  the  striate  area.  In 


man  there  are  about  1,010,000  fibers  in 
the  optic  nerve  (Sholl,  1956)  which  on  the 
certainly  erroneous  assumption  of  one-for- 
one  synaptic  relaying  in  the  lateral  genicu¬ 
late  body  would  yield  the  unreasonably 
low  average  density  of  about  300  per  mm2 
at  the  area  striata.  The  corresponding 
figure  for  the  cat  is  25,000  per  mm2 
measured  by  Sholl  for  a  section  of  visual 
cortex  beneath  a  plane  pial  surface  (1956). 
Even  though  cortical  neurons  are  more 
widely  separated  in  man  than  in  cat  (Bok; 
Sholl,  1956)  this  discrepancy  is  too  great 
to  be  tolerated.  Cross-species  compari¬ 
sons  of  this  kind  are  not  without  an  ele¬ 
ment  of  risk,  however.  Clark  states  that 
in  the  monkey  1350  geniculate  cells  project 
on  to  each  square  millimeter  of  visual 
cortex  (1942).  Yet  examination  of  photo¬ 
micrographs  of  sections  of  striate  cortex 
in  cat,  monkey,  and  man  does  not  suggest 
widely  varying  ratios  of  volumes  of  com¬ 
bined  afferent  and  efferent  axons  (white 
matter)  to  volume  of  visual  cortex  served 
by  them  (Sholl,  1956;  Von  Bonin  and 
Bailey;  Von  Bonin,  McCulloch,  and 
Bailey),  and  hence  it  does  appear  that 
there  are  many  more  fibers  in  the  optic 
radiations  than  in  the  optic  nerve. 

An  interesting  corroboration  of  the 
present  theory  would  result  if  subsequent 
neuroanatomical  studies  were  to  show  that 
density  of  input  afferents  (the  optic  radia¬ 
tions)  is  indeed  constant  or  very  nearly  so 
over  all  of  the  striate  cortex,  correcting  for 
curvature  by  converting  to  cartesian  form 
gaussian  coordinates  as  used  by  Bok  to 
metrize  his  cortical  sections. 

Returning  to  the  continuous  mapping 
of  visual  field  on  to  area  striata,  the  burden 
of  our  discussion  is  that  metric  attributes 
of  visual  objects  are  enhanced  thereby  and 
it  has  been  suggested  that  a  distinct  sur- 
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vival  value  is  served  by  this  enhancement. 
The  arguments  in  Sections  II  and  III  by 
analogy  do  not  favor  formation  of  such 
spatially  highly  specific  nerve  paths  on  the 
basis  of  a  mammal's  short  visual  experi¬ 
ence  during  early  life;  it  is  much  more 
probable  that  all  of  evolutionary  time  was 
needed  for  their  specification.  Accord¬ 
ingly,  we  may  predict  that  visual  mapping 
studies  on  mammals  reared  from  birth  in 
light-free  and  pattern-vision-free  environ¬ 
ments  will  show  evidence  of  continuous 
maps  on  the  striate  area.  But  it  is  also 
reasonable  to  expect  that  such  maps  will 
be  diffuse  in  fine  structure  when  compared 
with  those  of  normal  animals,  who  have 
had  the  advantage  of  early  visual  experi¬ 
ence  in  our  Euclidean  space  to  flesh  out 
the  skeleton,  as  it  were,  of  their  common 
genetic  endowment.  Similar  remarks 
would  apply  to  mammals  whose  early 
visual  experience  is  not  of  a  metric  space 
at  all  but  of  randomly  changing  spots  as 
on  a  detuned  television  screen.  Such 
diffusely  lighted  or  flickering  environ¬ 
ments  would  preserve  monkeys  from  de¬ 
generation  of  retinal  ganglion  cells  and  in 
the  latter  case  exercise  “on-off”  optic  func¬ 
tions  as  well.  Retinal  degeneration  has 
never  been  encountered  in  cats  raised  in 
complete  darkness,  however,  even  after  a 
three  year  confinement  (Riesen,  1960). 

Studies  are  planned  for  the  coming  year 
to  check  some  of  the  hypotheses  discussed 
in  this  Section  by  appropriate  experimen¬ 
tal  techniques.  But  in  view  of  the  avail¬ 
able  evidence  it  would  seem  almost  certain 
that  a  major  function  of  the  area  striata  is 
the  facilitation  and  formation  of  sensory- 
sensory  associations  between  contempora¬ 
neous  elementary  stimuli  from  the  visual 
field  which  are  spatially  contiguous  or 
quite  close  together,  and  continuous, 


order-preserving  maps  are  a  necessary 
precondition  for  this.  The  basic  integra¬ 
tion  of  such  contiguous  stimuli  into  the 
elementary  perception  of  straight  lines  as 
a  result  of  eye  movement  in  the  presence 
of  a  visual  field  containing  point  light 
contrasts  has,  for  example,  been  exten¬ 
sively  analyzed  by  Hebb  (1949). 

Cortico-cortical  connections  within  the 
striate  area  are  significantly  local.  Asso¬ 
ciation  fibers  leaving  the  monkey  visual 
cortex  re-enter  it  immediately,  most  of 
them  within  2  millimeters  and  decremen- 
tally  distributed  as  required  by  our  theory, 
and  except  for  those  destined  for  the 
parastriate  or  the  superior  colliculus  all 
have  re-entered  within  5  millimeters 
(Clark;  Polyak,  pp.  435-436).  Braitenberg 
has  described  similar  but  intra-cortical 
horizontal  local  myelinated  connections 
in  the  region  of  Gennari’s  stripe,  roughly 
the  middle-third  layer.  Continuous  or 
topological  mappings  are  defined  as  those 
which  map  all  small  regions  into  corre¬ 
sponding  small  regions.  That  such  an 
operation  necessarily  generates  an  often 
distorted  but  usually  recognizable  map  of 
the  original  figure,  in  much  the  same  way 
that  successive  joining  of  individual  links 
eventually  produces  a  chain,  is  in  this  case 
merely  a  neuroanatomical  accident  due 
to  the  mathematical  fact  that  Euclidean 
space  is  also  a  topological  space  (as  are 
all  metric  spaces).  Except  in  the  case  of 
Hebb,  whose  theory  nicely  accommodates 
it,  this  accidental  consequence  seems  to 
have  confused  our  view  of  the  neurology 
of  visual  perception  for  a  generation. 


VI  Discuasii 


Summary  29 


All  of  the  mechanisms  discussed  in  this 
paper  are  essentially  static.  Where  time 
changes  have  been  introduced,  as  in  the 
concepts  of  f(x,  t),  pj(x  \  y),  and  C,  and 
C2  in  Section  Iv,  rather  slow  changes 
“clocked”  by  /  =  1,  2,  3, . . the  sequence 
of  pattern  learning  trials,  are  implied. 
We  have  not  introduced  any  reverbera¬ 
tory,  scanning,  or  noise-like  mechanisms 
although  analogues  to  these  may  well 
exist  in  living  organisms  and  certainly  are 
often  encountered  in  pattern  recognition 
schemes  programmed  upon  digital  com¬ 
puters.  For  example,  Hubei  has  found 
exceptions  to  the  simple  visual  map  in  the 
cat  when  more  or  less  rapid  motion  is 
taken  into  account.  Some  single  neurons 
in  the  cat  striate  cortex  fire  only  when  a 
point  light  source  is  moved  across  the 
visual  field  but  not  when  it  is  merely 
blinked  on  and  off,  while  others  only  cease 
firing  under  such  conditions;  and  still 
others  are  sensitive  to  movement  of  large 
objects  across  the  field.  Whether  such 
effects  are  due  to  genetically  specified 
anatomy  alone  or  need  early-life  visual 
experience  to  become  evident  might  be 
clarified  by  experiments  with  animals 
reared  in  darkness.  Such  special  temporal 
activity  is  not  inconsistent  with  our  previ¬ 
ous  developments;  it  is  merely  that  the 
viewpoint  of  this  paper  relegates  them  to 
the  category  of  means  or  does  not  en¬ 
compass  them.  The  slow  changes  referred 
to  earlier  are  intended  to  be  more  in  line 
with  Hebb’s  postulate,  applicable  in  prin¬ 
ciple  to  mechanical  (machine  elements  x) 
as  well  as  to  biological  neurons:  “When 
an  axon  of  cell  A  is  near  enough  to  excite 
a  cell  B  and  repeatedly  or  persistently 
takes  part  in  firing  it,  some  growth  process 
or  metabolic  change  takes  place  in  one  or 
both  cells  such  that  A’s  efficiency,  as  one 
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of  the  cells  firing  B,  is  increased.”  (Hebb, 
p.  62).  This  assumption  is  basic  to  the 
formation  and  stabilization  of  Hebb’s  cell 
assemblies  and  phase  sequences,  concepts 
which  have  had  a  profound  influence  upon 
modern  behavioral  studies  and  indeed 
which  have  been  mathematically  shown  by 
Beurle  to  be  applicable  as  well  to  a  sheet 
of  highly  stylized  neuron-like  elements 
obeying  the  exponentially  falling-off  con¬ 
nectivity  mode  of  Sholl  and  Uttley. 

The  attempt  has,  rather,  been  made  to 
arrive  at  a  most  general  description  of 
pattern-learning  devices  by  consideration 
of  principles  which  must  hold  for  all  such 
machines.  One  such  principle  is  that 
completely  random  “organization”  is  not 
practical  because  any  sequence  of  equi- 
significant  patterns  of  reasonable  length  is 
simultaneously  a  subclass  of  an  impossibly 
large  number  of  abstract  pattern  (or 
rather,  frame)  classes  only  a  small  fraction 
of  which  is  desired.  Implications  of  this 
important  point  have  previously  been 
overlooked  with  few  exceptions  (Day  and 
Newman;  Kalin;  Beer). 

A  second  principle  is  that  the  kinds  of 
visual  patterns  that  interest  us  are  built 
up  from  fundamental  ordering  properties 
of  Euclidean  space.  Indeed,  for  much 
of  our  analysis  a  space  with  completely 
ordered  distances  and  without  an  overt 
metric  would  have  sufficed.  A  third  prin¬ 
ciple  follows  from  the  previous  two:  some 
sort  of  structure  must  be  imposed  upon 
initial  machine  chaos  to  enable  an 
automaton  to  emphasize  these  ordered 
properties  of  patterns.  While  almost  all 
workable  pattern  recognition  devices 
exhibit  a  highly  sophisticated  structure 
(for  example,  Bledsoe  and  Browning; 
Roberts;  Doyle)  we  have  been  content  to 
state  two  quite  mild  constraints  upon 


initial  randomness  and  rigorously  to 
exhaust  their  consequences. 

It  has  turned  out  that  these  mild  con¬ 
straints  proved  sufficient  to  couple  our 
devices  effectively  to  Euclidean  space, 
the  properties  of  which  were  shown  to 
require  mapping  of  an  input  frame  on  to  a 
plane  with  no  local  scale  change.  This 
unexpectedly  specific  conclusion  was  de¬ 
rived  independently  of  any  particular  in¬ 
verse  functional  dependence  of  machine 
organization  change  with  distance  of 
separation  of  simultaneously  stimulated 
pattern  bits,  provided  only  that  proper 
monotonicity  be  preserved.  In  retrospect 
it  is  easy  to  see  that  a  certain  controlled 
amount  of  warping  and  stretching  of  the 
map  would  have  been  permitted  by  addi¬ 
tion  of  some  suitably  small  numerical 
tolerance  ±  e  to  the  antecedents  of  the 
implications  Ci  and  CiA  in  Section  IV, 
closest  points  in  X  then  mapping  inside 
an  annulus  in  X  of  thickness  2 kt,  k  the 
dilation  of  </>,  but  there  seems  to  be  no 
real  need  for  this  additional  complexity; 
indeed  Sholl's  data  on  exponential  fall- 
off  of  intracortieal  connections  implies 
the  contrary  (1956,  Fig.  8).  The  inclusion 
of  this  artifice  in  the  formal  development 
would  complicate  the  proofs  as  the  price 
of  the  greater  generality  obtained  but  in 
turn  might  make  the  analogy  between  our 
formal  development  and  mammalian  vis¬ 
ual  system  anatomy  more  palatable. 

As  it  is  this  analogy  seems  intriguing 
enough.  It  connects  the  hitherto  unre¬ 
lated  facts  of  topologically  invariant  visual 
maps  and  locally  random  but  exponen¬ 
tially  falling  off  connection  modes  in  the 
interior  of  the  mammalian  visual  cortex 
in  terms  of  one  theory  which  has  a  plausi¬ 
ble  foundation  in  evolution.  Indeed,  as 
we  ascend  the  phylogenetic  scale  through 
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rabbit,  cat  and  primate  to  man,  we  find 
the  primary  visual  center  progressively 
displaced  from  the  tectum  in  the  old  brain, 
increasing  lamination  of  the  lateral  genicu¬ 
late  relay  nucleus  as  both  visual  fields 
progressively  overlap,  and  increasing  focus¬ 
sing  of  optic  radiations  from  that  nucleus 
on  to  the  area  striata  of  the  cerebral 
cortex  until  in  man  that  is  their  only, 
sharply  defined,  terminal  site  (Bishop, 
Burke,  Davis,  and  Hayhow). 

The  principles  underlying  the  theory 
are  clearly  applicable  to  other  cortical 
sensory  projection  areas  where  ordering 
is  important.  The  dog  middle  ectosylvian 
auditory  cortex,  for  example,  is  function¬ 
ally  laid  out  in  a  sequence  of  side-by-side 
parallel  strips  each  responding  to  about 
0.1  octave  with  no  spatial  reference  along 
the  strip  to  sound  from  the  contralateral 
ear.  This  second  dimension  is  correlated 
with  sound  intensity  of  the  same  frequency 
from  the  ipsilateral  ear,  however,  so  that 
a  point  on  the  two-dimensional  area  corre¬ 
sponds  not  only  to  sound  frequency  but 
to  intensity  ratio  as  sensed  by  the  two 
ears  (Tunturi). 

Of  course  the  actual  neurological  mech¬ 
anisms  involved  in  hearing  and  sight  are 
incredibly  more  complex  than  the  present 
naive  model  would  suggest.  No  real 
mention  has  been  made  of  binocular  vision 
and  of  what  role  the  delicately  sculptured 
lamina  in  the  lateral  geniculate  body  with 
its  precise  geometry  might  play  in  this 
connection,  nor  about  occulomotor  feed¬ 
back  paths  and  their  possible  function  in 
pattern  learning,  nor  has  more  than  pass¬ 
ing  reference  been  made  to  further  elabo¬ 
ration  upon  fundamentally  ordered  data 
from  the  visual  field  so  clearly  necessary 
to  effective  recognition  of  instances  of 
abstract  visual  forms  by  machine  or  mam¬ 


mal.  And,  lastly,  we  can  only  allude  to 
the  primal  mystery  of  visual  perception  — 
how  all  of  these  diverse  but  coordinated 
neurological  phenomena  somehow  coalesce 
in  the  unified  conscious  experience  of  see¬ 
ing  our  world. 
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APPMINT  or  AFFtRINT  FIHR 

An  axon  originating  elsewhere  entering  a 
region  of  nervous  tissue  upon  which  it 
exerts  an  effect. 

AXON 

That  portion  of  a  nerve  cell,  or  neuron, 
whose  main  function  is  the  propagation  of 
an  all-or-none  discharge  from  the  main 
cell  body  to  distant  points.  Axons  may 
be  quite  long  —  up  to  three  feet  in  the 
peripheral  nervous  system  —  but  ordi¬ 
narily  extend  a  few  millimeters  or  centi¬ 
meters  in  the  cerebral  cortex.  “White 
matter”  consists  almost  exclusively  of 
myelinated  axon  fibers.  Antidromic  or 
backward  firing  of  axons  is  easily  induced 
in  the  laboratory  but  probably  occurs  only 
rarely  in  living  animals,  and  then  only  in 
the  peripheral  nervous  system. 

BINARY  QUANTIZATION,  BIT 

BIT 

Contraction  of  Binary  Digit.  The  binary 
digit,  0  or  1,  is  the  elemental  unit  of  infor¬ 
mation  in  modern  communication  and 
information  theory  (Shannon)  and  is 
featured  prominently  in  this  paper  because 
of  close  analogies  to  inclusion  or  exclusion 
of  an  element  in  a  set  or  class,  to 
combinatorial  operations  on  integers,  and 
to  the  “on-off”  mode  of  operation  of  mod¬ 
ern  digital  electronic  computers.  A  binary 
number  is  a  positional  sequence  of  binary 
digits  weighted  by  successive  powers 

n  —  1 

of  two;  thus  any  integer  I  =  J2  &i2'> 

o 

0  <  /  <  2",  and  any  algebraic  number 

CD 

N  =  ^2  6, -2*  where  the  6,-’s  are  binary 

—  co 

digits.  These  simple  relationships  allow 
us  to  "quantize”  lengths,  magnitudes, 
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threshold  values,  etc.,  to  a  reasonable 
degree  of  fineness  in  terms  of  an  interval 
containing  discrete  jumps  and  thus  to 
express  any  point  within  the  interval  in 
terms  of  the  step  on  which  it  lies,  which  in 
turn  is  typified  by  a  binary  number  or 
equivalently  by  a  sequence  of  binary 
digits.  To  avoid  constant  repetition  “bit” 
is  also  used  in  this  paper  to  denote  an 
element  of  an  input  frame  as  well  as  the 
binary  quantized  value  of  that  element, 
either  0  or  1,  a  two-step  quantization  only. 


COMPLITI  OROBRINO 

In  the  abstract,  a  requirement  that  a  set 
of  elements  { a}  be  susceptible  to  relational 
predicates  <,  =,  in  such  a  way  that  one 
and  only  one  of  a  <  b,  a  =  b,  or  b  <  a  is 
true  for  all  a,  6  in  jo},  that  if  a  <  b  and 
b  <  c  then  a  <  c,  and  further,  that  if 
a  =  b  then  b  =  a.  jo)  is  then  said  to  be  a 
completely  ordered  set  or  space.  When 
these  predicates  are  conventionally  inter¬ 
preted  as  "is  less  than”  and  "is  equal  to” 
applied  to  real  numbers  it  is  easy  to  see 
that  a  finite  metric  space  ja;)  of  n  elements 
is  always  associated  with  a  completely 
ordered  set  { d(x,y)}  of  }-n(n  —  1)  ele¬ 
ments,  the  distances  between  pairs.  If 
such  complete  ordering  cannot  be  accom¬ 
plished  then  of  course  the  corresponding 
point  set  is  not  metrizable.  In  any  case  a 
completely  ordered  set  does  not  correspond 
uniquely  to  some  one  metric  space  since  it 
embodies  a  more  general  notion  than 
quantifiable  distance.  The  basic  argu¬ 
ments  in  Section  IV  (Lemma  2,  Theorems 
2  and  3)  are  based  upon  complete  distance 
ordering.  "Ordering”  in  this  report  means 
“comph  a  ordering”  since  we  do  not  work 
with  partially-ordered  sets,  those  for  which 
“  =  ”  does  not  apply. 


CORTIX 

Same  as  cerebral  cortex  in  this  paper. 
From  the  Latin  for  “bark”;  the  outer  few 
millimeters  of  the  cerebral  hemispheres  in 
mammals.  Cortical  tissue  is  "grey  matter” 
from  the  characteristic  appearance  of  un¬ 
myelinated  nerve  protoplasm.  Frogs  and 
reptiles,  as  well  as  birds,  do  not  have  a 
cerebral  cortex  or,  more  properly,  a  neo¬ 
pallium.  Except  for  the  sensory  and 
motor  projection  areas,  and  the  left  tem¬ 
poral  lobe  “speech  center”  in  most  cases, 
the  human  cortex  shows  a  puzzling  lack  of 
correlation  of  function  with  structure 
(Hebb).  Decrease  in  functional  efficiency 
seems  to  depend  primarily  upon  amount 
of  non-specific  cortical  material  removed. 
This  is  quite  opposed  to  the  more  geneti¬ 
cally  specific  functional  connections  in  the 
brains  of  lower  phyla  which,  on  the  other 
hand,  exhibit  more  rudimentary  learning 
activity.  Damage  to  the  visual  cortex, 
on  the  other  hand,  results  in  a  permanent 
localized  visual  defect.  Polyak’s  case 
"Mallory”  is  a  detailed  thirty-year  clinical 
study  of  such  a  lesion. 

DBNORITBS 

Branching  processes  extending  from  the 
perikaryon  of  a  neuron  whose  main  func¬ 
tion  seems  to  be  integration  of  impulses 
received  through  synapses  from  the  axons 
of  other  neurons.  Recent  work  suggests 
that  some  dendrites  may  never  undergo 
explosive  "all  or  none”  firing  activity, 
however  (Bullock).  In  the  central  nervous 
system  dendrites  may  account  for  as  much 
as  nine-tenths  of  total  neuron  volume. 

IFPIRKNT  or  IFFIRINT  FIBBR 

An  axon  of  a  neuron  within  a  region  of 
nervous  tissue  but  leaving  it  to  produce 
an  effect  elsewhere. 
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EQUISIONIFICANCE 

Meaning  the  same  thing.  The  basic 
problem  in  pattern  recognition  is  the  clas¬ 
sification  of  patterns  into  classes  the  mem¬ 
bers  of  which  have  the  same  meaning,  as 
Selfridge  pointed  out  in  1955.  Reichen- 
bach  employed  a  similar  notion  in  defining 
a  “symbol”  as  an  equisignificance  class  of 
"tokens,”  or  particular  physical  signs 
(Reiehenbach,  p.  4).  In  introducing  this 
notion  in  Section  II,  I  have  emphasized 
that  two  or  more  patterns  of  the  same  class 
are  to  have  the  same  meaning  to  the 
machine’s  designers  and  operators  in  order 
to  stress  the  pragmatic  aspects  of  the  con¬ 
cept.  To  say  that  a  machine  can  handle 
meanings  directly  is  gratuitous;  it  is  more 
to  the  point  to  recognize  that  machines 
deal  with  physical  states  and  events  which 
symbolize  meanings  for  us.  It  is  then 
easier  to  appreciate  that  other  physical 
entities  which  the  machine  might  handle 
equally  well  may  have  no  meaning  —  for 
example,  most  frames  in  the  abstract 
classes  discussed  in  Section  III. 

EUCLIDEAN  METRIC 

As  used  in  Section  IV  and  elsewhere 
in  this  paper,  dt(z,  y)  =  ((xi  -  yO2  + 
( x2  -  Vi)2)'12  where  x  =  (xh  z2),  y  = 
(Vu  Vi)  in  cartesian  coordinates,  and 
d'(x,  y)  =  ((*!-  y,)2  -f  (*2  -  y2)2  + 
(x,  —  y,i)2)1/2  also  in  cartesian  coordinates. 
The  two  sets  of  points  X  and  X  are  respec¬ 
tively  embedded  in  the  infinite  Euclidean 
spaces  E2  and  E3,  metrics  on  which  of 
course  remain  invariant  in  value  for  given 
points  with  respect  to  any  changes  of  co¬ 
ordinate  system  adopted  but  assume  in 
general  different  forms  of  expression  for 
different  coordinate  schemes.  This  is  the 
metric  peculiar  to  our  immediate  spatial 
world  in  which  instances  of  visual  patterns 


are  found  as  well  as  machines  for  recogniz¬ 
ing  them,  and  of  course  our  own  central 
nervous  systems  have  evolved  and  exist  in 
Euclidean  space  as  well.  In  this  world  E2 
is  included  in  E3  and  is  not  a  separate 
entity,  as  is  often  convenient  to  assume 
in  the  abstract.  It  is  in  this  sense  that  we 
speak  in  Section  IV  of  some  “specialized 
plane”  in  a  volume  of  machine  elements 
and  of  the  cortical  surface  evolving  in  E3, 
in  Section  V. 

FRAME 

Any  one  of  the  2”  possible  configurations 
of  binary  digits  corresponding  to  the  binary 
quantizing  modes  of  an  input  frame. 

OANOLION  CELLS 

In  this  paper,  retinal  ganglion  cells.  These 
are  neurons  in  the  retina  whose  axons  form 
the  fibers  of  the  optic  nerve. 

GEODESIC 

A  curve  in  a  metric  space  comprised  only 
of  shortest  distances.  For  all  points 
x,  y,  z  in  that  order  along  such  a  curve, 
d{ x,  y)  +  d(y,  z)  =  d(x,  z). 


In  the  case  of  visual  pattern  recognizing 
machines,  a  finite  two-dimensional  array 
of  quantizing  optical  transducers. 

LATERAL  GENICULATE  BODY 

Syn.:  Lateral  geniculate  relay  nucleus, 
lateral  geniculate  nucleus.  One  of  two 
subcortical  structures  each  receiving  optic 
fibers  from  the  retinas  of  both  eyes,  corres¬ 
ponding  to  half  of  the  visual  field,  and 
sending  fibers  to  the  striate  area  of  its 
hemisphere.  A  laminar  structure  becomes 
increasingly  evident  as  the  phylogenetic 
scale  is  ascended,  optic  fibers  from  the  two 
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eyes  ending  in  alternate  laminations  which 
in  the  cat  have  some  common  neurons  at 
the  interfaces  but  not  in  primates  and  man. 
The  function  of  this  body  seems  inti¬ 
mately  related  to  binocular  vision  and  per¬ 
haps  to  color  vision  in  primates  and  man 
(Clark).  From  the  viewpoint  of  the  pres¬ 
ent  paper  additional  functions  may  be 
participation  in  the  transformation 
of  Section  IV  and  generation  of  extra 
fibers  as  suggested,  for  example,  by  the 
microscopic  geometry  of  the  cat  geniculate 
body  (Fig.  1  of  Bishop,  Burke,  Davis, 
and  Hayhow,  and  Fig.  216  of  Polyak), 
of  monkey  (Fig.  211,  Polyak),  and  of  man 
(Fig.  213,  Polyak). 

MSTRIC 

A  distance  measure  on  a  set.  A  metric 
d(x,  y )  on  a  set  of  elements  {x\  specifies  a 
real  non-negative  number,  the  “distance” 
from  x  to  y.  This  number  is  zero  if  and 
only  if  x  and  y  coincide,  and  is  the  same 
from  y  to  x  as  from  x  to  y.  Distance  means 
shortest  distance:  all  triples  x,  y,  z  in  {x\ 
satisfy  the  “triangle  inequality”  d(x,  y)  + 
d(y,  z)  >  d{x,  z).  The  set  {x}  is  then  said 
to  be  a  metric  space.  (Busemann;  Hilbert 
and  Cohn-Vossen).  Euclidean  space  is  an 
example,  as  is  the  hyperbolic  space  of 
Lobatchevskian  geometry.  The  trivial 
metric  space  defined  by  d'(x,  y)  in  Sec¬ 
tion  III  is  an  instance  of  Busemann’s 
metric  space  Rt.  The  finite  metric  spaces 
considered  in  this  paper  refer  directly  or 
indirectly  to  an  input  frame  upon  which  a 
pattern  is  quantized  into  n  bits,  and  never 
to  a  space  of  2"  points  the  elements  of 
which  stand  for  possible  frames.  In  other 
words  we  have  not  ordered,  let  alone 
metrized,  any  class  of  equisignificant  pat¬ 
terns.  Such  a  class  is  a  subset  of  the  space 
of  2"  frames  and  can  have  elements  from 


many  distinct  pattern  classes  as  members 
simultaneously  as  was  argued  in  Section 
III.  An  observer  confronted  with  this 
situation  would  no  doubt  prefer  to  order 
these  elements  differently  with  respect  to 
their  resemblance  to  some  one  “arche¬ 
typical”  pattern  in  each  of  the  overlapping 
equisignificance  classes.  Thus  the  subset 
would  not  be  metrizable  as  such  since  no 
unambiguous  ordering  of  distances  would 
be  available.  It  is  possible,  on  the  other 
hand,  to  order  adequately  the  patterns 
within  one  equisignificance  class  with  re¬ 
spect  to  an  archetype  of  that  class  if  all 
other  classes  are  ignored.  It  has  also 
proved  possible  in  practice  to  identify  an 
unknown  pattern  lying  in  one  of  a  limited 
number  of  largely  mutually-exclusive  equi¬ 
significance  classes  with  its  proper  class  by 
measuring  its  distance  (suitably  defined) 
from  each  of  the  several  archetypes  in  turn 
and  choosing  the  smallest  number  (Bled¬ 
soe  and  Browning). 


OPTIC  RADIATIONS 

The  physiological  term  used  to  describe 
axons  from  neurons  in  the  lateral  genicu¬ 
late  body  of  the  thalamus  that  terminate 
in  the  visual  cortex.  Distinguished  from 
“optic  fibers”  of  the  optic  nerve  which 
enter  the  geniculate  body  from  the  retina 
and  initiate  activity  in  the  optic  radiations 
by  synaptic  connections  in  the  genicu¬ 
late  layers. 


RATTSRN 

A  humanly-meaningful  frame.  This  terse 
definition  merely  reflects  the  truism  that 
we  are  only  concerned  with  machines 
which  interact  with  the  physical  world 
in  a  way  interesting  or  useful  to  us. 
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PERIKARYON 

Syn.:  Main  cell  body.  A  neuron  minus  its 
dendrites  and  axon(s).  That  part  of  the 
nerve  cell  containing  the  nucleus,  and  the 
site  of  significant  metabolic  functions 
(Eccles).  In  the  cerebral  sensory  cortex 
activity  in  the  neighborhood  of  its  asso¬ 
ciated  dendrites  is  the  main  factor  in 
determining  whether  an  all-or-none  dis¬ 
charge  occurs  in  the  perikaryon  (and  hence 
propagated  down  the  axon),  but  multiple 
synaptic  junctions  directly  on  the  peri¬ 
karyon  are  typical  of  spinal  motor  neurons. 

PROBABILITY 

Probabilistic  notions  used  in  this  report 
have  purposely  been  vaguely  phrased  to 
preserve  generality.  In  Section  III  pf(x,  t) 
would,  strictly  speaking,  adequately  refer 
to  a  probability  function  only  if  some 
random-like  process  were  involved  —  so 
that  repeated  runs  of  the  machine  with  the 
same  sequence  of  patterns  as  t  =  1,  2,  3, 
.  .  .  converge  upon  p,(x,  l)  at  the  l,h  trial, 
t  — >  °° .  If  the  machine  is  determinate, 
with  no  internal  noise  sources,  such  con¬ 
vergence  cannot  occur  because  the  se¬ 
quences  of  machine  states  will  be  identical 
under  identical  conditions,  a  finite-state 
automaton  in  the  conventional  sense,  and 
P/(x,  t )  must  then  only  refer  to  our  lack  of 
complete  knowledge  as  to  how  the  device 
is  wired  internally.  In  this  case  we  would 
require  the  machine  to  be  so  internally 
organized  that  the  constraints  Cu  C2  hold 
in  a  sense  comparable  to  that  characteriz¬ 
ing  one  sequence  of  trials  on  the  internal 
noise  source  device.  Analytical  methods 
applicable  to  describing  the  behavior  of 
determinate  machines  may  vary  from 
those  derived  from  the  theory  of  finite- 
state  automata  (Kleene;  Minsky)  to 
complicated  formulations  involving 


continuously  ranging  variables.  The 
internal-random-process  machines  may  be 
characterized  by  finite  Markov  chains  or 
discrete  Markov  processes  depending  upon 
whether  transitional  probabilities  between 
states  depend  upon  the  value  of  t  (Kemeny 
and  Snell).  Although  it  is  perhaps  difficult 
to  conceive  of  a  learning  machine  whose 
next  state  depends  uniquely  upon  its 
present  one,  digital  computers  upon  which 
so  many  pattern-learning  schemes  have 
been  simulated  fall  into  this  category  un¬ 
less  they  include  a  noise  source  or  a  random 
number  table  inaccessible  to  the  program¬ 
mer.  In  any  event  it  is  reiterated  that  no 
commitment  to  any  of  these  various  alter¬ 
native  positions  has  been  made  in  the  text 
(Sections  IV  and  VI). 


SIMILARITY  TRANSFORMATION 

A  transformation  group  in  the  Euclidean 
plane  characterized  by  translation,  rota¬ 
tion,  dilation,  and  reflection.  If  we 
associate  a  complex  number  with  each 
point  in  the  Euclidean  plane  the  similarity 
transformations  are  conformal  mappings 
w  =  az  +  b  or  w  =  az  +  b  (with  reflec¬ 
tion),  z  the  complex  conjugate  of  z,  a  and  b 
complex,  and  a  ^  0.  (These  are  the  only 
conformal  mappings  carrying  E 2  into 
itself).  The  dilation  and  rotation  of  such 
a  mapping  are  specified  by  a,  and  trans¬ 
lation  by  b.  The  plane-into-plane  simi¬ 
larity  transformations  in  three  dimensions, 
the  <p  discussed  in  Section  IV,  refer  to  this 
two  dimensional  group  plus  additional 
data  needed  in  any  given  case  to  specify 
the  relative  position  and  orientation  in  E 3 
of  5  and  the  input  frame,  but  as  a  class 
do  not  themselves  technically  form  a  group 
unless  both  X  and  X  are  considered  em¬ 
bedded  in  the  abstract  Euclidean  plane. 
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SYNAPSE 

A  junction  between  axon  of  one  neuron 
and  dendrite  of  another,  over  which  all- 
or-none  spike  discharge  activity  may 
travel.  Some  kind  of  summing  and  thresh¬ 
old  function  is  commonly  attributed  to 
synaptic  junctions  in  the  central  nervous 
system  in  order  to  obtain  non-linear  dis¬ 
crimination  between  varying  numbers  of 
impulses  arriving  at  one  dendritic  tree. 
In  spinal  motor  neurons  the  important 
synapses  occur  directly  on  the  perikaryon. 

TICTUM 

Syn.:  Colliculus.  A  subcortical  region 
phylogenetically  very  old.  In  frogs,  rep¬ 
tiles,  and  birds  the  primary  visual  projec¬ 
tions  are  found  here  but  in  man  they  have 
been  entirely  displaced  to  the  visual 
cortex. 

TOPOLOGICAL.  MAPPING 

A  transformation  on  a  set  or  space  which 
carries  neighborhoods  of  each  point  into 
neighborhoods  of  uniquely  corresponding 
points.  In  the  classical  topology  of  infin¬ 
ite  sets  a  neighborhood  is  often  defined  as 
any  open  set  containing  the  point  in 
question,  explicitly  allowing  such  sets  to 
be  very  “small."  But  often  in  such  topo¬ 
logical  spaces  there  is  no  notion  (even 
undefined)  of  “distance,”  so  that  complete 
distance  ordering  is  difficult  to  tie  down. 
Further,  specialization  of  the  most  ab¬ 
stract  topological  spaces  to  the  finite  set 
case  requires  some  supporting  machinery 
irrelevant  to  our  present  purposes.  The 
approach  adopted  in  this  report  has  the 
virtue  of  requiring  a  minimum  of  pre¬ 
liminary  mathematical  argument,  made 
possible  because  we  assume  that  the 
Euclidean  spaces  E 2  and  E3  (which  are 
also  topological  spaces)  are  unavoidable 


in  actual  pattern-learning  mechanisms, 
whether  machines  or  mammalian  visual 
systems.  We  avoid  overt  commitment 
to  the  Euclidean  metric  in  the  earlier 
stages  of  the  argument,  however,  by  tacitly 
associating  with  each  of  E1  and  E3 
a  related  set  of  completely  ordered 
distances  between  pairs  of  points,  as 
described  under  complete  ordering, 
and  by  couching  most  of  the  arguments  in 
Section  IV  in  terms  of  inequalities  or 
equalities  between  such  distances.  We 
thus  make  use  of  some  of  the  topological 
properties  of  Euclidean  space  without 
becoming  involved  with  topology  as  such. 

VISUAL  CORTEX 

Syn.:  Striate  Area,  Area  Striata.  Seat  of 
the  visual  primary  cortical  projection  area 
in  mammals.  The  term  “striate"  de¬ 
rives  from  the  fact  that  the  region  is  the 
most  regular  as  well  as  the  most  clearly 
laminated  in  microscopic  section  of  all 
cortical  areas.  It  is  located  at  the  rear  of 
the  brain  and,  in  man,  is  partially  con¬ 
cealed  within  the  fissure  separating  the 
two  cerebral  hemispheres.  The  visual 
cortex  occurs  in  two  distinct  parts,  one  in 
each  hemisphere,  each  of  which  receives 
information  from  one-half  of  the  visual 
field  but  originating  in  both  eyes,  the 
signals  mixing  in  the  lateral  geniculate 
body.  The  division  between  halves  is  a 
vertical  line  at  the  center  of  vision.  The 
striate  area  receives  (in  man)  fibers  only 
from  the  lateral  geniculate  body  and 
sends  axons  only  to  the  surrounding  para- 
striate  area,  of  all  other  cortical  regions, 
although  subcortical  efferents  to  the  supe¬ 
rior  colliculus  facilitate  occulomotor  activ¬ 
ity.  Electrical  stimulation  byafineelectrode 
implanted  in  the  optic  radiations  (Walter) 
as  well  as  by  probe  touching  the  striate 


cortex  (Krieg)  gives  rise  to  punctuate  light 
sensation  localized  in  space,  in  the  con¬ 
scious  human  subject,  while  similar  exter¬ 
nal  stimulation  on  the  parastriate  has  been 
reported  to  cause  perceptions  of  forms 
without  spatial  orientation  and  lesions 
there  dramatically  impair  pattern  recog¬ 
nition  performance  (Krieg).  The  subjec¬ 
tive  experience  of  visual  fusion  of  both 
halves  of  the  visual  field  in  normal  sub¬ 
jects  is  apparently  facilitated  by  cross¬ 
connecting  fibers  joining  the  parastriate 
areas  in  opposite  hemispheres  as  well  as 
callosal  connections  (those  between  cere¬ 
bral  hemispheres)  joining  higher  areas; 
there  are  no  such  cross  connections  be¬ 
tween  striate  regions.  Striate  and  para¬ 
striate  correspond  to  areas  17  and  18  in 
Brodmann’s  architectonic  scheme. 

X 

The  set  of  elements  x,  called  "input  frame 
bits,”  in  terms  of  which  a  visual  pattern 
or  frame  is  quantized. 

X 

The  set  of  machine  elements  x  connected 
to  the  input  frame  and  presumed  to  have 
some  sort  of  interaction  capability  which 
can  change  with  time  as  successive  pat¬ 
terns  are  presented  to  the  input  frame. 
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